Bayesian News Feeds
In the plane to Atlanta, I happened to read a paper called Efficient simulation of the Ginibre point process by Laurent Decreusefond, Ian Flint, and Anaïs Vergne (from Telecom Paristech). “Happened to” as it was a conjunction of getting tipped by my new Dauphine colleague (and fellow blogger!) Djalil Chaffaï about the paper, having downloaded it prior to departure, and being stuck in a plane (after watching the only Chinese [somewhat] fantasy movie onboard, Saving General Yang).
This is mostly a mathematics paper. While indeed a large chunk of it is concerned with the rigorous definition of this point process in an abstract space, the last part is about simulating such processes. They are called determinantal (and not detrimental as I was tempted to interpret on my first read!) because the density of an n-set (x1, x2,…,xn) is given by a kind of generalised Vandermonde determinant
where T is defined in terms of an orthonormal family,
(The number n of points can be simulated via an a.s. finite Bernoulli process.) Because of this representation, the sequence of conditional densities for the xi‘s (i.e. x1, x2 given x1, etc.) can be found in closed form. In the special case of the Ginibre process, the ψi‘s are of the form
and the process cannot be simulated for it has infinite mass, hence an a.s. infinite number of points. Somehow surprisingly (as I thought this was the point of the paper), the authors then switch to a truncated version of the process that always has a fixed number N of points. And whose density has the closed form
It has an interestingly repulsive quality in that points cannot get close to one another. (It reminded me of the pinball sampler proposed by Kerrie Mengersen and myself at one of the Valencia meetings and not pursued since.) The conclusion (of this section) is anticlimactic, though, in that it is known that this density also corresponds to the distribution of the eigenvalues of an Hermitian matrix with standardized complex Gaussian entries. The authors mentions that the fact that the support is the whole complex space Cn is a difficulty, although I do not see why.
The following sections of the paper move to the Ginibre process restricted to a compact and then to the truncated Ginibre process restricted to a compact, for which the authors develop corresponding simulation algorithms. There is however a drag in that the sequence of conditionals, while available in closed-form, cannot be simulated efficiently but rely on a uniform accept-reject instead. While I am certainly missing most of the points in the paper, I wonder if a Gibbs sampler would not be an interesting alternative given that the full (last) conditional is a Gaussian density…
Filed under: Statistics, Travel Tagged: Atlanta, determinantal processes, flight, Gibbs sampler, Ginibre process, MCMC algorithms, pinball sampler, Telecom Paris, USA, Valencia conferences, Vandermonde determinant
While waiting for Jean-Michel to leave a thesis defence committee he was part of, I read this recently arXived survey by Novak and Rudolf, Computation of expectations by Markov chain Monte Carlo methods. The first part hinted at a sort of Bernoulli factory problem: when computing the expectation of f against the uniform distribution on G,
For x ∈ G we can compute f (x) and G is given by a membership oracle, i.e. we are able to check whether any x is in G or not.
However, the remainder of the paper does not get (in) that direction but recalls instead convergence results for MCMC schemes under various norms. Like spectral gap and Cheeger’s inequalities. So useful for a quick reminder, e.g. to my Monte Carlo Statistical Methods class Master students, but altogether well-known. The paper contains some precise bounds on the mean square error of the Monte Carlo approximation to the integral. For instance, for the hit-and-run algorithm, the uniform bound (for functions f bounded by 1) is
where d is the dimension of the space and r a scale of the volume of G. For the Metropolis-Hastings algorithm, with (independent) uniform proposal on G, the bound becomes
where C is an upper bound on the target density (no longer the uniform). [I rephrased Theorem 2 by replacing vol(G) with the containing hyper-ball to connect both results, αd being the proportionality constant.] The paper also covers the case of the random walk Metropolis-Hastings algorithm, with the deceptively simple bound
but this is in the special case when G is the ball of radius d. The paper concludes with a list of open problems.
Filed under: pictures, Running, Statistics, Travel, University life Tagged: arXiv, Bernoulli factory, Canada, Lake Ontario, MCMC, Monte Carlo Statistical Methods, Toronto
Source: Bayesian Anal., Volume 8, Number 4, 741--758.
This article examines the convergence properties of a Bayesian model selection procedure based on a non-local prior density in ultrahigh-dimensional settings. The performance of the model selection procedure is also compared to popular penalized likelihood methods. Coupling diagnostics are used to bound the total variation distance between iterates in an Markov chain Monte Carlo (MCMC) algorithm and the posterior distribution on the model space. In several simulation scenarios in which the number of observations exceeds 100, rapid convergence and high accuracy of the Bayesian procedure is demonstrated. Conversely, the coupling diagnostics are successful in diagnosing lack of convergence in several scenarios for which the number of observations is less than 100. The accuracy of the Bayesian model selection procedure in identifying high probability models is shown to be comparable to commonly used penalized likelihood methods, including extensions of smoothly clipped absolute deviations (SCAD) and least absolute shrinkage and selection operator (LASSO) procedures.
Source: Bayesian Anal., Volume 8, Number 4, 759--780.
Histone modifications (HMs) play important roles in transcription through post-translational modifications. Combinations of HMs, known as chromatin signatures, encode specific messages for gene regulation. We therefore expect that inference on possible clustering of HMs and an annotation of genomic locations on the basis of such clustering can contribute new insights about the functions of regulatory elements and their relationships to combinations of HMs. We propose a nonparametric Bayesian local clustering Poisson model (NoB-LCP) to facilitate posterior inference on two-dimensional clustering of HMs and genomic locations. The NoB-LCP clusters HMs into HM sets and lets each HM set define its own clustering of genomic locations. Furthermore, it probabilistically excludes HMs and genomic locations that are irrelevant to clustering. By doing so, the proposed model effectively identifies important sets of HMs and groups regulatory elements with similar functionality based on HM patterns.
Source: Bayesian Anal., Volume 8, Number 4, 781--800.
We study a Bayesian model where we have made specific requests about the parameter values to be estimated. The aim is to find the parameter of a parametric family which minimizes a distance to the data generating density and then to estimate the discrepancy using nonparametric methods. We illustrate how coherent updating can proceed given that the standard Bayesian posterior from an unidentifiable model is inappropriate. Our updating is performed using Markov Chain Monte Carlo methods and in particular a novel method for dealing with intractable normalizing constants is required. Illustrations using synthetic data are provided.
Source: Bayesian Anal., Volume 8, Number 4, 801--836.
The problem of inferring a clustering of a data set has been the subject of much research in Bayesian analysis, and there currently exists a solid mathematical foundation for Bayesian approaches to clustering. In particular, the class of probability distributions over partitions of a data set has been characterized in a number of ways, including via exchangeable partition probability functions (EPPFs) and the Kingman paintbox. Here, we develop a generalization of the clustering problem, called feature allocation, where we allow each data point to belong to an arbitrary, non-negative integer number of groups, now called features or topics. We define and study an “exchangeable feature probability function” (EFPF)—analogous to the EPPF in the clustering setting—for certain types of feature models. Moreover, we introduce a “feature paintbox” characterization—analogous to the Kingman paintbox for clustering—of the class of exchangeable feature models. We provide a further characterization of the subclass of feature allocations that have EFPF representations.
Source: Bayesian Anal., Volume 8, Number 4, 837--882.
We propose a general algorithm for approximating nonstandard Bayesian posterior distributions. The algorithm minimizes the Kullback-Leibler divergence of an approximating distribution to the intractable posterior distribution. Our method can be used to approximate any posterior distribution, provided that it is given in closed form up to the proportionality constant. The approximation can be any distribution in the exponential family or any mixture of such distributions, which means that it can be made arbitrarily precise. Several examples illustrate the speed and accuracy of our approximation method in practice.
Source: Bayesian Anal., Volume 8, Number 4, 883--908.
It is sometimes preferable to conduct statistical analyses based on the combination of several models rather than on the selection of a single model, thus taking into account the uncertainty about the true model. Models are usually combined using constant weights that do not distinguish between different regions of the covariate space. However, a procedure that performs well in a given situation may not do so in another situation. In this paper, we propose the concept of local Bayes factors, where we calculate the Bayes factors by restricting the models to regions of the covariate space. The covariate space is split in such a way that the relative model efficiencies of the various Bayesian models are about the same in the same region while differing in different regions. An algorithm for clustered Bayes averaging is then proposed for model combination, where local Bayes factors are used to guide the weighting of the Bayesian models. Simulations and real data studies show that clustered Bayesian averaging results in better predictive performance compared to a single Bayesian model or Bayesian model averaging where models are combined using the same weights over the entire covariate space.
Valen E. Johnson. On Numerical Aspects of Bayesian Model Selection in High and Ultrahigh-dimensional Settings. 741--758.
Yanxun Xu, Juhee Lee, Yuan Yuan, Riten Mitra, Shoudan Liang, Peter Müller, Yuan Ji. Nonparametric Bayesian Bi-Clustering for Next Generation Sequencing Count Data. 759--780.
Pierpaolo De Blasi, Stephen G. Walker. Bayesian Estimation of the Discrepancy with Misspecified Parametric Models. 781--800.
Tamara Broderick, Jim Pitman, Michael I. Jordan. Feature Allocations, Probability Functions, and Paintboxes. 801--836.
Tim Salimans, David A. Knowles. Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression. 837--882.
Qingzhao Yu, Steven N. MacEachern, Mario Peruggia. Clustered Bayesian Model Averaging. 883--908.
“This is, in this revised version, an outstanding paper that covers the Jeffreys-Lindley paradox (JLP) in exceptional depth and that unravels the philosophical differences between different schools of inference with the help of the JLP. From the analysis of this paradox, the author convincingly elaborates the principles of Bayesian and severity-based inferences, and engages in a thorough review of the latter’s account of the JLP in Spanos (2013).” Anonymous
I have now received a second round of reviews of my paper, “On the Jeffreys-Lindleys paradox” (submitted to Philosophy of Science) and the reports are quite positive (or even extremely positive as in the above quote!). The requests for changes are directed to clarify points, improve the background coverage, and simplify my heavy style (e.g., cutting Proustian sentences). These requests were easily addressed (hopefully to the satisfaction of the reviewers) and, thanks to the week in Warwick, I have already sent the paper back to the journal, with high hopes for acceptance. The new version has also been arXived. I must add that some parts of the reviews sounded much better than my original prose and I was almost tempted to include them in the final version. Take for instance
“As a result, the reader obtains not only a better insight into what is at stake in the JLP, going beyond the results of Spanos (2013) and Sprenger (2013), but also a much better understanding of the epistemic function and mechanics of statistical tests. This is a major achievement given the philosophical controversies that have haunted the topic for decades. Recent insights from Bayesian statistics are integrated into the article and make sure that it is mathematically up to date, but the technical and foundational aspects of the paper are well-balanced.” Anonymous
Filed under: Statistics, University life Tagged: Aris Spanos, Bayesian model choice, Deborah Mayo, Error-Statistical philosophy, Harold Jeffreys, Jeffreys-Lindley paradox, Philosophy of Science, referee, severity
Filed under: pictures, Statistics, Travel, University life Tagged: ABC, Boston, IMS, JSM 2014, model selection, random forests, SNPs, summary statistics
In a “crazy travelling week” (dixit my daughter), I gave a talk at an IYS 2013 conference organised by Stephen Senn (formerly at Glasgow) and colleagues in the city of Luxembourg, Grand Duché du Luxembourg. I enjoyed very much the morning train trip there as it was a misty morning, with the sun rising over the frosted-white countryside. (I cannot say much about the city of Luxembourg itself though as I only walked the kilometre from the station to the conference hotel and the same way back. There was a huge gap on the plateau due to a river in the middle, which would have been a nice place to run, I presume…)
One of the few talks I attended there was about an econometric model with instrumental variables. In general, and this dates back to my student’s years at ENSAE, I do not get the motivation for the distinction between endogenous and exogenous in econometrics models. Especially in non-parametric models as, if we do not want to make parametric assumptions, we have difficulties in making instead correlation hypotheses… My bent would be to parametrise everything under the suspicion of this everything being correlated with everything. The instrumental variables econometricians seem so fond of appear to me like magical beings, since we have to know they are instrumental. And because they seem to allow to always come back to a linear setting, by eliminating the non-linear parts. Sounds like a “more for less” free-lunch deal. (Any pointer would be appreciated.) The speaker there actually acknowledged (verbatim) that they are indeed magical and that they cannot be justified by mathematics or statistics. A voodoo part of econometrics then?!
A second talk that left me perplexed was about a generalised finite mixture model. The model sounded like a mixture along time of individuals, ie a sort of clustering of longitudinal data. It looked like it should be easier to estimate than usual mixtures of regressions because an individual contributed to the same regression line for all the times when it was observed. The talk was uninspiring as it missed connections to EM and to Bayesian solutions, focussing instead on a gradient method that sounded inappropriate for a multimodal likelihood. (Funny enough, the choice in the number of regressions was done by BIC.)
Filed under: pictures, Statistics, Travel, University life Tagged: ABC, Bayesian econometrics, Glasgow, instrumental variables; non-response, IYS 2013, Luxembourg, trains
This paper (arXived a few days ago) compares maximum likelihood with different ABC approximations in a quantum physic setting and for an atom maser modelling that essentially bears down to a hidden Markov model. (I mostly blanked out of the physics explanations so cannot say I understand the model at all.) While the authors (from the University of Nottingham, hence Robin’s statue above…) do not consider the recent corpus of work by Ajay Jasra and coauthors (some of which was discussed on the ‘Og), they get interesting findings for an equally interesting model. First, when comparing the Fisher informations on the sole parameter of the model, the “Rabi angle” φ, for two different sets of statistics, one gets to zero at a certain value of the parameter, while the (fully informative) other is maximum (Figure 6). This is quite intriguing, esp. give the shape of the information in the former case, which reminds me of (my) inverse normal distributions. Second, the authors compare different collections of summary statistics in terms of ABC distributions against the likelihood function. While most bring much more uncertainty in the analysis, the whole collection recovers the range and shape of the likelihood function, which is nice. Third, they also use a kolmogorov-Smirnov distance to run their ABC, which is enticing, except that I cannot fathom from the paper when one would have enough of a sample (conditional on a parameter value) to rely on what is essentially an estimate of the sampling distribution. This seems to contradict the fact that they only use seven summary statistics. Or it may be that the “statistic” of waiting times happens to be a vector, in which case a Kolmogorov-Smirnov distance can indeed be adopted for the distance… The fact that the grouped seven-dimensional summary statistic provides the best ABC fit is somewhat of a surprise when considering the problem enjoys a single parameter.
“However, in practice, it is often difficult to find an s(.) which is sufficient.”
Just a point that irks me in most ABC papers is to find quotes like the above, since in most models, it is easy to show that there cannot be a non-trivial sufficient statistic! As soon as one leaves the exponential family cocoon, one is doomed in this respect!!!
Filed under: Books, Statistics, University life Tagged: ABC, ABC approximation error, Kolmogorov-Smirnov distance, likelihood-free methods, maser, quantum mechanics, summary statistics
I had bought the first book in Courtney Schafer’s trilogy, The Shattered Sigil, because of its nice cover that hinted at mountains and climbing. The second tome in the series, The tainted city, is much less involved in those two interests of mine, except for one page or two on mixed climbing, with ice-axes, crampons, and the skills of using them on both ice and rock. (And has a terrible cover.) Most of the book takes place within cities and is about power fights in both parts of the book universe. Previous characters are back, with an interesting balance between good and evil for one of them, plus a few new ones, including of course a new arch-evil one…
“Prediction is always challenging with currents as complex as those…” (p.226)
While I found the first volume somehow clumsy in its psychology, The tainted city is much better balanced in this respect, as characters have gained in depth and consistency. They are more ambivalent too, including even the frightening magician Ruslan. The universe behind the books is also getting better described, with the way(s) magic acts getting more complex and linked with the past. The plot is however a bit shallow, from the reason for bringing the main characters back to their own town, Ninavel, despite the obvious dangers to them and their captors, to the occurrence of a global all-threatening danger, (spoiler!) to the final battle temporarily eliminating the vector if not the source of the danger, to the convenient re-apparition of Dev’s former love with a final twist… It nonetheless reads well and paves the way to the third volume in the series, currently in the writing. Looking around at others’ reviews of The tainted city, I could not find any negative one, with every critic praising the second volume above the first one. A point with which I obviously concur.
Filed under: Statistics
Filed under: pictures, Running, Travel, University life Tagged: Coventry, England, Kenilworth, sunrise, University of Warwick, Warwickshire
An easily phrased (and solved?) Le Monde mathematical puzzle that does not [really] require an R code:
The five triplets A,B,C,D,E are such that
find the five triplets.
Adding up both sets of equations shows everything solely depends upon E1… So running an R code that checks for all possible values of E1 is a brute-force solution. However, one must first find what to check. Given that the sums of the triplets are of the form (16s,4s,s), the possible choices for E1 are necessarily restricted to> S0=193+187+185+175 > ceiling(S0/16)  47 > floor((S0+175)/16)  57 > (47:57)*16-S0 #E1=S1-S0  12 28 44 60 76 92 108 124 140 156 172
The first two values correspond to a second sum S2 equal to 188 and 192, respectively, which is incompatible with A1 being 193. Furthermore, the corresponding values for E2 and E3 are then given by> S2==(49:57)*4 > E1=(49:57)*16-S0 > E2=S2-E1 > S3=S2/4 > S3-E2  -103 -90 -77 -64 -51 -38 -25 -12 1
which excludes all values but E1=172. No brute-force in the end…
Filed under: Books, Kids, R Tagged: Le Monde, mathematical puzzle, R, system of equations