## Bayesian Bloggers

### stupid traditions are simply stupid

**O**ver the past weekend, several men were killed in Spain. By running bulls. Most in the streets of Pamplona. And one in a bullring. The argument behind those events is to invoke “tradition“. Which means it happened for several centuries. Starting (?) with the Minoan bull leapers. But that does not make an argument when live are at stake and animals turned into killing machines (and often injured in the process). A lot of other absurd and cruel traditions have rightly disappeared over the years… The European Union does not yet prohibit this animal mistreatment although it prohibits EU funds to be used as support, but it should as it is high time bull running and bull fighting are prohibited everywhere in Spain (as bull fighting currently is in both Catalunya and the Canary Islands). [I realise this is not the number one problem in the World and that many more people died the same weekend of deaths that could have been avoided, it is just that it would be so simple to eliminate this one (dis)reason.]

Filed under: Running, Travel Tagged: bull fighting, Canary Islands, Catalunya, European Union, running bulls, Spain

### difficult times for postdocs

**I**n the plane to Warwick on Monday, I was reading my latest issue of Nature and found an interesting editorial on the financial plight of many graduates and post-docs in both the US and the UK (and certainly elsewhere). Who, despite having a fellowship, cannot make ends meet. This is particularly true in expensive cities like London, Oxford or even Paris, where rents force those new researchers to face long commuting hours. The editorial suggests taking extra-jobs to make up for financial difficulties, but this does not sound to me like a particularly pertinent recommendation if it means taking time off one’s research, at the period in a researcher’s career where one’s energy should be mostly directed at the production of papers towards securing a (more) permanent job. Even teaching can prove too time consuming for finishing PhD students. An adequation between the needs of those young researchers and the institutional support they receive would sound like a natural requirement, while graduates looking for fellowship should truly assess the adequation in detail before accepting an offer.Which of course is not always easy. In countries where post-doctoral contracts are not negotiable and are set at a national level (like, e.g., France), checking with earlier fellows is a must. (As it happens or happened, I was quite lucky to spend my post-doctoral years in cheap places with decent support from the local universities, but this is not relevant in today’s environment!)

Filed under: Kids, Travel, University life Tagged: fellowships, graduate course, Nature, postdocs, postdoctoral position, University of Warwick

### the curious incident of the inverse of the mean

**A** s I figured out while working with astronomer colleagues last week, a strange if understandable difficulty proceeds from the simplest and most studied statistical model, namely the Normal model

x~N(θ,1)

Indeed, if one reparametrises this model as x~N(υ⁻¹,1) with υ>0, a *single* observation x brings very little information about υ! (This is not a toy problem as it corresponds to estimating distances from observations of parallaxes.) If x gets large, υ is very likely to be small, but if x is small or negative, υ is certainly large, with no power to discriminate between highly different values. For instance, Fisher’s information for this model and parametrisation is υ⁻² and thus collapses at zero.

While one can always hope for Bayesian miracles, they do not automatically occur. For instance, working with a Gamma prior Ga(3,10³) on υ [as informed by a large astronomy dataset] leads to a posterior expectation hardly impacted by the value of the observation x:

And using an alternative estimate like the harmonic posterior mean that is associated with the relative squared error loss does not see much more impact from the observation:

There is simply not enough information contained in one datapoint (or even several datapoints for all that matters) to infer about υ.

Filed under: R, Statistics, University life Tagged: astronomy, Bayesian inference, inverse problems, parallaxes

### Monte Carlo in the convent

**L**ast week, at the same time as the workshop on retrospective Monte Carlo in Warwick, there was a Monte Carlo conference in Paris, closing a Monte Carlo cycle run by Institut Louis Bachelier from October 2015 till June 2016. It took place in the convent of Les Cordeliers, downtown Paris [hence the title] and I alas could not attend the talks. As I organised a session on Bayesian (approximate) computations, with Richard Everitt, Jere Koskela, and Chris Sherlock as speakers (and Robin Ryder as chair), here are the slides of the speakers (actually, Jere most kindly agreed to give Chris’ talk as Chris was to sick to travel to Paris):

Filed under: pictures, Statistics, Travel, University life Tagged: coalescent, delayed acceptance, Dirichlet mixture priors, Institut Louis Bachelier, Monte Carlo, noisy Monte Carlo, Paris, pseudo-marginal MCMC, retrospective Monte Carlo, University of Warwick

### Extending R

**A**s I was previously unaware of this book coming up, my surprise and excitement were both extreme when I received it from CRC Press a few weeks ago! John Chambers, one of the fathers of S, precursor of R, had just published a book about extending R. It covers some reflections of the author on programming and the story of R (Parts 2 and 1), and then focus on object-oriented programming (Part 3) and the interfaces from R to other languages (Part 4). While this is “only” a programming book, and thus not strictly appealing to statisticians, reading one of the original actors’ thoughts on the past, present, and future of R is simply fantastic!!! And John Chambers is definitely not calling to simply start over and build something better, as Ross Ihaka did in this [most read] post a few years ago. (It is also great to see the names of friends appearing at times, like Julie, Luke, and Duncan!)

*“I wrote most of the original software for S3 methods, which were useful for their application, in the early 1990s.”*

In the (hi)story part, Chambers delves into the details of the evolution of S at Bells Labs, as described in his [first] “blue book” (which I kept on my shelf until very recently, next to the “white book“!) and of the occurrence of R in the mid-1990s. I find those sections fascinating maybe the more because I am somewhat of a contemporary, having first learned Fortran (and Pascal) in the mid-1980’s, before moving in the early 1990s to C (that I mostly coded as translated Pascal!), S-plus and eventually R, in conjunction with a (forced) migration from Unix to Linux, as my local computer managers abandoned Unix and mainframe in favour of some virtual Windows machines. And as I started running R on laptops with the help of friends more skilled than I (again keeping some of the early R manuals on my shelf until recently). Maybe one of the most surprising things about those reminiscences is that the very first version of R was dated Feb 29, 2000! Not because of Feb 29, 2000 (which, as Chambers points out, is the first use of the third-order correction to the Gregorian calendar, although I would have thought 1600 was the first one), but because I would have thought it appeared earlier, in conjunction with my first Linux laptop, but this memory is alas getting too vague!

As indicated above, the book is mostly about programming, which means in my case that some sections are definitely beyond my reach! For instance, reading “*the onus is on the person writing the calling function to avoid using a reference object as the argument to an existing function that expects a named list*” is not immediately clear… Nonetheless, most sections are readable [at my level] and enlightening about the mottoes “*everything that exists is an object*” and “*everything that happens is a function*” repeated throughout. (And about my psycho-rigid ways of translating Pascal into every other language!) I obviously learned about new commands and notions, like the difference between

and

x <<- 3(but I was disappointed to learn that the number of <‘s was not related with the depth or height of the allocation!) In particular, I found the part about replacement fascinating, explaining how a command like

diag(x)[i] = 3could modify x directly. (While definitely worth reading, the chapter on R packages could have benefited from more details. But as Chambers points out there are whole books about this.) Overall, I am afraid the book will not improve *my* (limited) way of programming in R but I definitely recommend it to anyone even moderately skilled in the language.

Filed under: Books, Kids, R, Statistics Tagged: Bell Labs, book review, C, CRAN, extending R, Fortran, John Chambers, laptop, Linux, Luke Tierney, object-oriented programming, packages, Pascal, R, Ross Ihaka, S, S-plus, unix

### retrospective Monte Carlo

**T**he past week I spent in Warwick ended up with a workshop on retrospective Monte Carlo, which covered exact sampling, debiasing, Bernoulli factory problems and multi-level Monte Carlo, a definitely exciting package! (Not to mention opportunities to go climbing with some participants.) In particular, several talks focussed on the debiasing technique of Rhee and Glynn (2012) [inspired from von Neumann and Ulam, and already discussed in several posts here]. Including results in functional spaces, as demonstrated by a multifaceted talk by Sergios Agapiou who merged debiasing, deburning, and perfect sampling.

From a general perspective on unbiasing, while there exist sufficient conditions to ensure finite variance and aim at an optimal version, I feel a broader perspective should be adopted towards comparing those estimators with biased versions that take less time to compute. In a diffusion context, Chang-han Rhee presented a detailed argument as to why his debiasing solution achieves a O(√n) convergence rate in opposition the regular discretised diffusion, but multi-level Monte Carlo also achieves this convergence speed. We had a nice discussion about this point at the break, with my slow understanding that continuous time processes had much much stronger reasons for sticking to unbiasedness. At the poster session, I had the nice surprise of reading a poster on the penalty method I discussed the same morning! Used for subsampling when scaling MCMC.

On the second day, Gareth Roberts talked about the Zig-Zag algorithm (which reminded me of the cigarette paper brand). This method has connections with slice sampling but it is a continuous time method which, in dimension one, means running a constant velocity particle that starts at a uniform value between 0 and the maximum density value and proceeds horizontally until it hits the boundary, at which time it moves to another uniform. Roughly. More specifically, this approach uses piecewise deterministic Markov processes, with a radically new approach to simulating complex targets based on continuous time simulation. With computing times that [counter-intuitively] do not increase with the sample size.

Mark Huber gave another exciting talk around the Bernoulli factory problem, connecting with perfect simulation and demonstrating this is not solely a formal Monte Carlo problem! Some earlier posts here have discussed papers on that problem, but I was unaware of the results bounding [from below] the expected number of steps to simulate B(f(p)) from a (p,1-p) coin. If not of the open questions surrounding B(2p). The talk was also great in that it centred on recursion and included a fundamental theorem of perfect sampling! Not that surprising given Mark’s recent book on the topic, but exhilarating nonetheless!!!

The final talk of the second day was given by Peter Glynn, with connections with Chang-han Rhee’s talk the previous day, but with a different twist. In particular, Peter showed out to achieve perfect or exact estimation rather than perfect or exact simulation by a fabulous trick: perfect sampling is better understood through the construction of random functions φ¹, φ², … such that X²=φ¹(X¹), X³=φ²(X²), … Hence,

which helps in constructing coupling strategies. However, since the φ’s are usually iid, the above is generally distributed like

which seems pretty similar but offers a much better concentration as t grows. Cutting the function composition is then feasible towards producing unbiased estimators and more efficient. (I realise this is not a particularly clear explanation of the idea, detailed in an arXival I somewhat missed. When seen this way, Y would seem much more expensive to compute [than X].)

Filed under: pictures, Running, Statistics, Travel, University life Tagged: cigarettes, CRiSM, debiasing, exact estimation, exact sampling, Monte Carlo Statistical Methods, multi-level Monte Carlo, perfect sampling, retrospective Monte Carlo, University of Warwick, Zig-Zag

### coupled filters

**P**ierre Jacob, Fredrik Lindsten, and Thomas Schön recently arXived a paper on coupled particle filters. A coupling problem that proves to be much more complicated than expected, due to the discrete nature of particle filters. The starting point of the paper is the use of common (e.g., uniform) random numbers for the generation of each entry in the particle system at each time t, which maximal correlation gets damaged by the resampling steps (even when using the same uniforms). One suggestion for improving the correlation between entries at each time made in the paper is to resort to optimal transport, using the distance between particles as the criterion. A cheaper alternative is inspired from multi-level Monte Carlo. It builds a joint multinomial distribution by optimising the coupling probability. *[Is there any way to iterate this construct instead of considering only the extreme cases of identical values versus independent values?]* The authors also recall a “sorted sampling” method proposed by Mike Pitt in 2002, which is to rely on the empirical cdfs derived from the particle systems and on the inverse cdf technique, which is the approach I would have first considered. Possibly with a smooth transform of both ecdf’s in order to optimise the inverse cdf move. Actually, I have trouble with the notion that the ancestors of a pair of particles should matter. Unless one envisions a correlation of the entire path, but I am ensure how one can make paths correlated (besides coupling). And how this impacts likelihood estimation. As shown in the above excerpt, the coupled approximations produce regular versions and, despite the negative bias, fairly accurate evaluations of likelihood ratios, which is all that matters in an MCMC implementation. The paper also proposes a smoothing algorithm based on Rhee and Glynn (2012) debiasing technique, which operates on expectations against the smoothing distribution (conditional on a value of the parameter θ). Which may connect with the notion of simulating correlated paths. The interesting part is that, due to the coupling, the Rhee and Glynn unbiased estimator has a finite (if random) stopping time.

Filed under: Kids, Statistics, University life Tagged: bootstrap filter, debiasing, ecdf, filtering, multi-level Monte Carlo, optimal transport, particle system, smoothing, sorted sampling, unbiased estimation

### in memoriam

**H**undreds of Irakis have been killed in the Karrada shopping centre bombing in Baghdad. They should be remembered as individuals, with each his or her own story and interrupted life, not just as numbers.

Filed under: Statistics Tagged: Baghdad, Irak, notjustanumber, terrorism, victims

### automatic variational ABC

*“Stochastic Variational inference is an appealing alternative to the inefficient sampling approaches commonly used in ABC.”*

Moreno et al. [including Ted Meeds and Max Welling] recently arXived a paper merging variational inference and ABC. The argument for turning variational is computational speedup. The traditional (in variational inference) divergence decomposition of the log-marginal likelihood is replaced by an ABC version, parameterised in terms of intrinsic generators (i.e., generators that do not depend on cyber-parameters, like the U(0,1) or the N(0,1) generators). Or simulation code in the authors’ terms. Which leads to the automatic aspect of the approach. In the paper the derivation of the gradient is indeed automated.

*“One issue is that even assuming that the ABC likelihood is an unbiased estimator of the true likelihood (which it is not), taking the log introduces a bias, so that we now have a biased estimate of the lower bound and thus biased gradients.”*

I wonder how much of an issue this is, since we consider the variational lower bound. To be optimised in terms of the parameters of the variational posterior. Indeed, the endpoint of the analysis is to provide an optimal variational approximation, which remains an approximation whether or not the likelihood estimator is unbiased. A more “severe” limitation may be in the inversion constraint, since it seems to eliminate Beta or Gamma distributions. (Even though calling qbeta(runif(1),a,b) definitely is achievable… And not rejected by a Kolmogorov-Smirnov test.)

Incidentally, I discovered through the paper the existence of the Kumaraswamy distribution, which main appeal seems to be the ability to produce a closed-form quantile function, while bearing some resemblance with the Beta distribution. (Another arXival by Baltasar Trancón y Widemann studies some connections between those, but does not tell how to select the parameters to optimise the similarity.)

Filed under: pictures, Statistics Tagged: ABC, Amsterdam, beta distribution, bias, Kumaraswamy distribution, likelihood function estimator, likelihood-free methods, pseudo-random generator, qbeta, R, variational Bayes methods