Bayesian News Feeds
Over the past weekend, several men were killed in Spain. By running bulls. Most in the streets of Pamplona. And one in a bullring. The argument behind those events is to invoke “tradition“. Which means it happened for several centuries. Starting (?) with the Minoan bull leapers. But that does not make an argument when live are at stake and animals turned into killing machines (and often injured in the process). A lot of other absurd and cruel traditions have rightly disappeared over the years… The European Union does not yet prohibit this animal mistreatment although it prohibits EU funds to be used as support, but it should as it is high time bull running and bull fighting are prohibited everywhere in Spain (as bull fighting currently is in both Catalunya and the Canary Islands). [I realise this is not the number one problem in the World and that many more people died the same weekend of deaths that could have been avoided, it is just that it would be so simple to eliminate this one (dis)reason.]
Filed under: Running, Travel Tagged: bull fighting, Canary Islands, Catalunya, European Union, running bulls, Spain
In the plane to Warwick on Monday, I was reading my latest issue of Nature and found an interesting editorial on the financial plight of many graduates and post-docs in both the US and the UK (and certainly elsewhere). Who, despite having a fellowship, cannot make ends meet. This is particularly true in expensive cities like London, Oxford or even Paris, where rents force those new researchers to face long commuting hours. The editorial suggests taking extra-jobs to make up for financial difficulties, but this does not sound to me like a particularly pertinent recommendation if it means taking time off one’s research, at the period in a researcher’s career where one’s energy should be mostly directed at the production of papers towards securing a (more) permanent job. Even teaching can prove too time consuming for finishing PhD students. An adequation between the needs of those young researchers and the institutional support they receive would sound like a natural requirement, while graduates looking for fellowship should truly assess the adequation in detail before accepting an offer.Which of course is not always easy. In countries where post-doctoral contracts are not negotiable and are set at a national level (like, e.g., France), checking with earlier fellows is a must. (As it happens or happened, I was quite lucky to spend my post-doctoral years in cheap places with decent support from the local universities, but this is not relevant in today’s environment!)
Filed under: Kids, Travel, University life Tagged: fellowships, graduate course, Nature, postdocs, postdoctoral position, University of Warwick
A s I figured out while working with astronomer colleagues last week, a strange if understandable difficulty proceeds from the simplest and most studied statistical model, namely the Normal model
Indeed, if one reparametrises this model as x~N(υ⁻¹,1) with υ>0, a single observation x brings very little information about υ! (This is not a toy problem as it corresponds to estimating distances from observations of parallaxes.) If x gets large, υ is very likely to be small, but if x is small or negative, υ is certainly large, with no power to discriminate between highly different values. For instance, Fisher’s information for this model and parametrisation is υ⁻² and thus collapses at zero.
While one can always hope for Bayesian miracles, they do not automatically occur. For instance, working with a Gamma prior Ga(3,10³) on υ [as informed by a large astronomy dataset] leads to a posterior expectation hardly impacted by the value of the observation x:
Filed under: R, Statistics, University life Tagged: astronomy, Bayesian inference, inverse problems, parallaxes
Last week, at the same time as the workshop on retrospective Monte Carlo in Warwick, there was a Monte Carlo conference in Paris, closing a Monte Carlo cycle run by Institut Louis Bachelier from October 2015 till June 2016. It took place in the convent of Les Cordeliers, downtown Paris [hence the title] and I alas could not attend the talks. As I organised a session on Bayesian (approximate) computations, with Richard Everitt, Jere Koskela, and Chris Sherlock as speakers (and Robin Ryder as chair), here are the slides of the speakers (actually, Jere most kindly agreed to give Chris’ talk as Chris was to sick to travel to Paris):
Filed under: pictures, Statistics, Travel, University life Tagged: coalescent, delayed acceptance, Dirichlet mixture priors, Institut Louis Bachelier, Monte Carlo, noisy Monte Carlo, Paris, pseudo-marginal MCMC, retrospective Monte Carlo, University of Warwick
As I was previously unaware of this book coming up, my surprise and excitement were both extreme when I received it from CRC Press a few weeks ago! John Chambers, one of the fathers of S, precursor of R, had just published a book about extending R. It covers some reflections of the author on programming and the story of R (Parts 2 and 1), and then focus on object-oriented programming (Part 3) and the interfaces from R to other languages (Part 4). While this is “only” a programming book, and thus not strictly appealing to statisticians, reading one of the original actors’ thoughts on the past, present, and future of R is simply fantastic!!! And John Chambers is definitely not calling to simply start over and build something better, as Ross Ihaka did in this [most read] post a few years ago. (It is also great to see the names of friends appearing at times, like Julie, Luke, and Duncan!)
“I wrote most of the original software for S3 methods, which were useful for their application, in the early 1990s.”
In the (hi)story part, Chambers delves into the details of the evolution of S at Bells Labs, as described in his [first] “blue book” (which I kept on my shelf until very recently, next to the “white book“!) and of the occurrence of R in the mid-1990s. I find those sections fascinating maybe the more because I am somewhat of a contemporary, having first learned Fortran (and Pascal) in the mid-1980’s, before moving in the early 1990s to C (that I mostly coded as translated Pascal!), S-plus and eventually R, in conjunction with a (forced) migration from Unix to Linux, as my local computer managers abandoned Unix and mainframe in favour of some virtual Windows machines. And as I started running R on laptops with the help of friends more skilled than I (again keeping some of the early R manuals on my shelf until recently). Maybe one of the most surprising things about those reminiscences is that the very first version of R was dated Feb 29, 2000! Not because of Feb 29, 2000 (which, as Chambers points out, is the first use of the third-order correction to the Gregorian calendar, although I would have thought 1600 was the first one), but because I would have thought it appeared earlier, in conjunction with my first Linux laptop, but this memory is alas getting too vague!
As indicated above, the book is mostly about programming, which means in my case that some sections are definitely beyond my reach! For instance, reading “the onus is on the person writing the calling function to avoid using a reference object as the argument to an existing function that expects a named list” is not immediately clear… Nonetheless, most sections are readable [at my level] and enlightening about the mottoes “everything that exists is an object” and “everything that happens is a function” repeated throughout. (And about my psycho-rigid ways of translating Pascal into every other language!) I obviously learned about new commands and notions, like the difference betweenx <- 3
andx <<- 3
(but I was disappointed to learn that the number of <‘s was not related with the depth or height of the allocation!) In particular, I found the part about replacement fascinating, explaining how a command likediag(x)[i] = 3
could modify x directly. (While definitely worth reading, the chapter on R packages could have benefited from more details. But as Chambers points out there are whole books about this.) Overall, I am afraid the book will not improve my (limited) way of programming in R but I definitely recommend it to anyone even moderately skilled in the language.
Filed under: Books, Kids, R, Statistics Tagged: Bell Labs, book review, C, CRAN, extending R, Fortran, John Chambers, laptop, Linux, Luke Tierney, object-oriented programming, packages, Pascal, R, Ross Ihaka, S, S-plus, unix
The past week I spent in Warwick ended up with a workshop on retrospective Monte Carlo, which covered exact sampling, debiasing, Bernoulli factory problems and multi-level Monte Carlo, a definitely exciting package! (Not to mention opportunities to go climbing with some participants.) In particular, several talks focussed on the debiasing technique of Rhee and Glynn (2012) [inspired from von Neumann and Ulam, and already discussed in several posts here]. Including results in functional spaces, as demonstrated by a multifaceted talk by Sergios Agapiou who merged debiasing, deburning, and perfect sampling.
From a general perspective on unbiasing, while there exist sufficient conditions to ensure finite variance and aim at an optimal version, I feel a broader perspective should be adopted towards comparing those estimators with biased versions that take less time to compute. In a diffusion context, Chang-han Rhee presented a detailed argument as to why his debiasing solution achieves a O(√n) convergence rate in opposition the regular discretised diffusion, but multi-level Monte Carlo also achieves this convergence speed. We had a nice discussion about this point at the break, with my slow understanding that continuous time processes had much much stronger reasons for sticking to unbiasedness. At the poster session, I had the nice surprise of reading a poster on the penalty method I discussed the same morning! Used for subsampling when scaling MCMC.
On the second day, Gareth Roberts talked about the Zig-Zag algorithm (which reminded me of the cigarette paper brand). This method has connections with slice sampling but it is a continuous time method which, in dimension one, means running a constant velocity particle that starts at a uniform value between 0 and the maximum density value and proceeds horizontally until it hits the boundary, at which time it moves to another uniform. Roughly. More specifically, this approach uses piecewise deterministic Markov processes, with a radically new approach to simulating complex targets based on continuous time simulation. With computing times that [counter-intuitively] do not increase with the sample size.
Mark Huber gave another exciting talk around the Bernoulli factory problem, connecting with perfect simulation and demonstrating this is not solely a formal Monte Carlo problem! Some earlier posts here have discussed papers on that problem, but I was unaware of the results bounding [from below] the expected number of steps to simulate B(f(p)) from a (p,1-p) coin. If not of the open questions surrounding B(2p). The talk was also great in that it centred on recursion and included a fundamental theorem of perfect sampling! Not that surprising given Mark’s recent book on the topic, but exhilarating nonetheless!!!
The final talk of the second day was given by Peter Glynn, with connections with Chang-han Rhee’s talk the previous day, but with a different twist. In particular, Peter showed out to achieve perfect or exact estimation rather than perfect or exact simulation by a fabulous trick: perfect sampling is better understood through the construction of random functions φ¹, φ², … such that X²=φ¹(X¹), X³=φ²(X²), … Hence,
which helps in constructing coupling strategies. However, since the φ’s are usually iid, the above is generally distributed like
which seems pretty similar but offers a much better concentration as t grows. Cutting the function composition is then feasible towards producing unbiased estimators and more efficient. (I realise this is not a particularly clear explanation of the idea, detailed in an arXival I somewhat missed. When seen this way, Y would seem much more expensive to compute [than X].)
Filed under: pictures, Running, Statistics, Travel, University life Tagged: cigarettes, CRiSM, debiasing, exact estimation, exact sampling, Monte Carlo Statistical Methods, multi-level Monte Carlo, perfect sampling, retrospective Monte Carlo, University of Warwick, Zig-Zag
Pierre Jacob, Fredrik Lindsten, and Thomas Schön recently arXived a paper on coupled particle filters. A coupling problem that proves to be much more complicated than expected, due to the discrete nature of particle filters. The starting point of the paper is the use of common (e.g., uniform) random numbers for the generation of each entry in the particle system at each time t, which maximal correlation gets damaged by the resampling steps (even when using the same uniforms). One suggestion for improving the correlation between entries at each time made in the paper is to resort to optimal transport, using the distance between particles as the criterion. A cheaper alternative is inspired from multi-level Monte Carlo. It builds a joint multinomial distribution by optimising the coupling probability. [Is there any way to iterate this construct instead of considering only the extreme cases of identical values versus independent values?] The authors also recall a “sorted sampling” method proposed by Mike Pitt in 2002, which is to rely on the empirical cdfs derived from the particle systems and on the inverse cdf technique, which is the approach I would have first considered. Possibly with a smooth transform of both ecdf’s in order to optimise the inverse cdf move. Actually, I have trouble with the notion that the ancestors of a pair of particles should matter. Unless one envisions a correlation of the entire path, but I am ensure how one can make paths correlated (besides coupling). And how this impacts likelihood estimation. As shown in the above excerpt, the coupled approximations produce regular versions and, despite the negative bias, fairly accurate evaluations of likelihood ratios, which is all that matters in an MCMC implementation. The paper also proposes a smoothing algorithm based on Rhee and Glynn (2012) debiasing technique, which operates on expectations against the smoothing distribution (conditional on a value of the parameter θ). Which may connect with the notion of simulating correlated paths. The interesting part is that, due to the coupling, the Rhee and Glynn unbiased estimator has a finite (if random) stopping time.
Filed under: Kids, Statistics, University life Tagged: bootstrap filter, debiasing, ecdf, filtering, multi-level Monte Carlo, optimal transport, particle system, smoothing, sorted sampling, unbiased estimation
Hundreds of Irakis have been killed in the Karrada shopping centre bombing in Baghdad. They should be remembered as individuals, with each his or her own story and interrupted life, not just as numbers.
Filed under: Statistics Tagged: Baghdad, Irak, notjustanumber, terrorism, victims
Moreno et al. [including Ted Meeds and Max Welling] recently arXived a paper merging variational inference and ABC. The argument for turning variational is computational speedup. The traditional (in variational inference) divergence decomposition of the log-marginal likelihood is replaced by an ABC version, parameterised in terms of intrinsic generators (i.e., generators that do not depend on cyber-parameters, like the U(0,1) or the N(0,1) generators). Or simulation code in the authors’ terms. Which leads to the automatic aspect of the approach. In the paper the derivation of the gradient is indeed automated.
“One issue is that even assuming that the ABC likelihood is an unbiased estimator of the true likelihood (which it is not), taking the log introduces a bias, so that we now have a biased estimate of the lower bound and thus biased gradients.”
I wonder how much of an issue this is, since we consider the variational lower bound. To be optimised in terms of the parameters of the variational posterior. Indeed, the endpoint of the analysis is to provide an optimal variational approximation, which remains an approximation whether or not the likelihood estimator is unbiased. A more “severe” limitation may be in the inversion constraint, since it seems to eliminate Beta or Gamma distributions. (Even though calling qbeta(runif(1),a,b) definitely is achievable… And not rejected by a Kolmogorov-Smirnov test.)
Incidentally, I discovered through the paper the existence of the Kumaraswamy distribution, which main appeal seems to be the ability to produce a closed-form quantile function, while bearing some resemblance with the Beta distribution. (Another arXival by Baltasar Trancón y Widemann studies some connections between those, but does not tell how to select the parameters to optimise the similarity.)
Filed under: pictures, Statistics Tagged: ABC, Amsterdam, beta distribution, bias, Kumaraswamy distribution, likelihood function estimator, likelihood-free methods, pseudo-random generator, qbeta, R, variational Bayes methods
Source: Bayesian Analysis, Volume 11, Number 3, 627--648.
Analyses of array-valued datasets often involve reduced-rank array approximations, typically obtained via least-squares or truncations of array decompositions. However, least-squares approximations tend to be noisy in high-dimensional settings, and may not be appropriate for arrays that include discrete or ordinal measurements. This article develops methodology to obtain low-rank model-based representations of continuous, discrete and ordinal data arrays. The model is based on a parameterization of the mean array as a multilinear product of a reduced-rank core array and a set of index-specific orthogonal eigenvector matrices. It is shown how orthogonally equivariant parameter estimates can be obtained from Bayesian procedures under invariant prior distributions. Additionally, priors on the core array are developed that act as regularizers, leading to improved inference over the standard least-squares estimator, and providing robustness to misspecification of the array rank. This model-based approach is extended to accommodate discrete or ordinal data arrays using a semiparametric transformation model. The resulting low-rank representation is scale-free, in the sense that it is invariant to monotonic transformations of the data array. In an example analysis of a multivariate discrete network dataset, this scale-free approach provides a more complete description of data patterns.
Source: Bayesian Analysis, Volume 11, Number 3, 649--670.
Functional data, with basic observational units being functions (e.g., curves, surfaces) varying over a continuum, are frequently encountered in various applications. While many statistical tools have been developed for functional data analysis, the issue of smoothing all functional observations simultaneously is less studied. Existing methods often focus on smoothing each individual function separately, at the risk of removing important systematic patterns common across functions. We propose a nonparametric Bayesian approach to smooth all functional observations simultaneously and nonparametrically. In the proposed approach, we assume that the functional observations are independent Gaussian processes subject to a common level of measurement errors, enabling the borrowing of strength across all observations. Unlike most Gaussian process regression models that rely on pre-specified structures for the covariance kernel, we adopt a hierarchical framework by assuming a Gaussian process prior for the mean function and an Inverse-Wishart process prior for the covariance function. These prior assumptions induce an automatic mean–covariance estimation in the posterior inference in addition to the simultaneous smoothing of all observations. Such a hierarchical framework is flexible enough to incorporate functional data with different characteristics, including data measured on either common or uncommon grids, and data with either stationary or nonstationary covariance structures. Simulations and real data analysis demonstrate that, in comparison with alternative methods, the proposed Bayesian approach achieves better smoothing accuracy and comparable mean–covariance estimation results. Furthermore, it can successfully retain the systematic patterns in the functional observations that are usually neglected by the existing functional data analyses based on individual-curve smoothing.
Source: Bayesian Analysis, Volume 11, Number 3, 671--695.
We consider Bayesian linear inverse problems in infinite-dimensional separable Hilbert spaces, with a Gaussian prior measure and additive Gaussian noise model, and provide an extension of the concept of Bayesian D-optimality to the infinite-dimensional case. To this end, we derive the infinite-dimensional version of the expression for the Kullback–Leibler divergence from the posterior measure to the prior measure, which is subsequently used to derive the expression for the expected information gain. We also study the notion of Bayesian A-optimality in the infinite-dimensional setting, and extend the well known (in the finite-dimensional case) equivalence of the Bayes risk of the MAP estimator with the trace of the posterior covariance, for the Gaussian linear case, to the infinite-dimensional Hilbert space case.
Source: Bayesian Analysis, Volume 11, Number 3, 697--724.
In this paper, we consider homogeneous normalized random measures with independent increments (hNRMI), a class of nonparametric priors recently introduced in the literature. Many of their distributional properties are known by now but their stick-breaking representation is missing. Here we display such a representation, which will feature dependent stick-breaking weights, and then derive explicit versions for noteworthy special cases of hNRMI. Posterior characterizations are also discussed. Finally, we devise an algorithm for slice sampling mixture models based on hNRMIs, which relies on the representation we have obtained, and implement it to analyze real data.
Source: Bayesian Analysis, Volume 11, Number 3, 725--752.
While statisticians and quantitative social scientists typically study the “effects of causes” (EoC), Lawyers and the Courts are more concerned with understanding the “causes of effects” (CoE). EoC can be addressed using experimental design and statistical analysis, but it is less clear how to incorporate statistical or epidemiological evidence into CoE reasoning, as might be required for a case at Law. Some form of counterfactual reasoning, such as the “potential outcomes” approach championed by Rubin, appears unavoidable, but this typically yields “answers” that are sensitive to arbitrary and untestable assumptions. We must therefore recognise that a CoE question simply might not have a well-determined answer. It is nevertheless possible to use statistical data to set bounds within which any answer must lie. With less than perfect data these bounds will themselves be uncertain, leading to a compounding of different kinds of uncertainty. Still further care is required in the presence of possible confounding factors. In addition, even identifying the relevant “counterfactual contrast” may be a matter of Policy as much as of Science. Defining the question is as non-trivial a task as finding a route towards an answer. This paper develops some technical elaborations of these philosophical points from a personalist Bayesian perspective, and illustrates them with a Bayesian analysis of a case study in child protection.
Asymptotic Properties of Bayes Risk of a General Class of Shrinkage Priors in Multiple Hypothesis Testing Under Sparsity
Source: Bayesian Analysis, Volume 11, Number 3, 753--796.
Consider the problem of simultaneous testing for the means of independent normal observations. In this paper, we study some asymptotic optimality properties of certain multiple testing rules induced by a general class of one-group shrinkage priors in a Bayesian decision theoretic framework, where the overall loss is taken as the number of misclassified hypotheses. We assume a two-groups normal mixture model for the data and consider the asymptotic framework adopted in Bogdan et al. (2011) who introduced the notion of asymptotic Bayes optimality under sparsity in the context of multiple testing. The general class of one-group priors under study is rich enough to include, among others, the families of three parameter beta, generalized double Pareto priors, and in particular the horseshoe, the normal–exponential–gamma and the Strawderman–Berger priors. We establish that within our chosen asymptotic framework, the multiple testing rules under study asymptotically attain the risk of the Bayes Oracle up to a multiplicative factor, with the constant in the risk close to the constant in the Oracle risk. This is similar to a result obtained in Datta and Ghosh (2013) for the multiple testing rule based on the horseshoe estimator introduced in Carvalho et al. (2009, 2010). We further show that under very mild assumption on the underlying sparsity parameter, the induced decisions using an empirical Bayes estimate of the corresponding global shrinkage parameter proposed by van der Pas et al. (2014), asymptotically attain the optimal Bayes risk up to the same multiplicative factor. We provide a unifying argument applicable for the general class of priors under study. In the process, we settle a conjecture regarding optimality property of the generalized double Pareto priors made in Datta and Ghosh (2013). Our work also shows that the result in Datta and Ghosh (2013) can be improved further.
Source: Bayesian Analysis, Volume 11, Number 3, 797--820.
We propose a new model for correlated outputs of mixed type, such as continuous and binary outputs, with a particular focus on joint regression and classification, motivated by an application in constrained optimization for computer simulation modeling. Our framework is based upon multivariate stochastic processes, extending Gaussian process methodology for modeling of continuous multivariate spatial outputs by adding a latent process structure that allows for joint modeling of a variety of types of correlated outputs. In addition, we implement fully Bayesian inference using particle learning, which allows us to conduct fast sequential inference. We demonstrate the effectiveness of our proposed methods on both synthetic examples and a real world hydrology computer experiment optimization problem where it is helpful to model the black box objective function as correlated with satisfaction of the constraint.
Peter D. Hoff. Equivariant and Scale-Free Tucker Decomposition Models. 627--648.
Jingjing Yang, Hongxiao Zhu, Taeryon Choi, Dennis D. Cox. Smoothing and Mean–Covariance Estimation of Functional Data with a Bayesian Hierarchical Model. 649--670.
Alen Alexanderian, Philip J. Gloor, Omar Ghattas. On Bayesian A- and D-Optimal Experimental Designs in Infinite Dimensions. 671--695.
S. Favaro, A. Lijoi, C. Nava, B. Nipoti, I. Prünster, Y. W. Teh. On the Stick-Breaking Representation for Homogeneous NRMIs. 697--724.
A. Philip Dawid, Monica Musio, Stephen E. Fienberg. From Statistical Evidence to Evidence of Causality. 725--752.
Prasenjit Ghosh, Xueying Tang, Malay Ghosh, Arijit Chakrabarti. Asymptotic Properties of Bayes Risk of a General Class of Shrinkage Priors in Multiple Hypothesis Testing Under Sparsity. 753--796.
Tony Pourmohamad, Herbert K. H. Lee. Multivariate Stochastic Process Models for Correlated Responses of Mixed Type. 797--820.