## Bayesian Bloggers

### Le Louvre snapshot

### comparison of Bayesian predictive methods for model selection

*“Dupuis and Robert (2003) proposed choosing the simplest model with enough explanatory power, for example 90%, but did not discuss the effect of this threshold for the predictive performance of the selected models. We note that, in general, the relative explanatory power is an unreliable indicator of the predictive performance of the submodel,”*

**J**uho Piironen and Aki Vehtari arXived a survey on Bayesian model selection methods that is a sequel to the extensive survey of Vehtari and Ojanen (2012). Because most of the methods described in this survey stem from Kullback-Leibler proximity calculations, it includes some description of our posterior projection method with Costas Goutis and Jérôme Dupuis. We indeed did not consider prediction in our papers and even failed to include consistency result, as I was pointed out by my discussant in a model choice meeting in Cagliari, in … 1999! Still, I remain fond of the notion of defining a prior on the embedding model and of deducing priors on the parameters of the submodels by Kullback-Leibler projections. It obviously relies on the notion that the embedding model is “true” and that the submodels are only approximations. In the simulation experiments included in this survey, the projection method “performs best in terms of the predictive ability” (p.15) and “is much less vulnerable to the selection induced bias” (p.16).

Reading the other parts of the survey, I also came to the perspective that model averaging makes much more sense than model choice in predictive terms. Sounds obvious stated that way but it took me a while to come to this conclusion. Now, with our mixture representation, model averaging also comes as a natural consequence of the modelling, a point presumably not stressed enough in the current version of the paper. On the other hand, the MAP model now strikes me as artificial and linked to a very rudimentary loss function. A loss that does not account for the final purpose(s) of the model. And does not connect to the “all models are wrong” theorem.

Filed under: Books, Statistics, University life Tagged: all models are wrong, Bayesian model averaging, Bayesian model choice, Bayesian model selection, Cagliari, Kullback-Leibler divergence, MAP estimators, prior projection, Sardinia, The Bayesian Choice

### meet the president [of Tunisia]

**M**y trip to work was somewhat more eventful than usual this morning: as the queue to switch to the A train was too long for my taste, I exited the Chatelet station to grab a Vélib rental bike near Le Louvre and followed the Louvre palace for a few hundred meters, until reaching a police barricade that left the remainder of the Rivoli street empty, a surreal sight on a weekday! As it happened, Beji Caid Essebsi, the president of Tunisia was on a state visit to Paris and staying at the 5-star Hotel Meurice. And just about to leave the hotel. So I hanged out there for a few minutes and watched a caravan of official dark cars leave the place, preceded by police bikes in formal dress! The ride to Dauphine was not yet straightforward as the Champs-Elysées had been closed as well, since the president was attending a commemoration (for Tunisian soldiers who died in French wars?) at Arc de Triomphe. This created a mess for traffic in the surrounding streets. Especially with pedestrians escaping from stuck buses and crowding my sidewalks! And yet another surreal sight of the Place de l’Étoile with no car. (In this end, this initiative of mine took an extra 1/2 hour on my average transit time…)

Filed under: Kids, pictures, Running, Travel Tagged: Arc de Triomphe, Champs-Elysées, Hotel Meurice, Le Louvre, Paris, RER A, Rue de Rivoli, traffic, Tunisia, Vélib

### an email exchange about integral representations

**I** had an interesting email exchange [or rather exchange of emails] with a (German) reader of Introducing Monte Carlo Methods with R in the past days, as he had difficulties with the validation of the accept-reject algorithm via the integral

in that it took me several iterations [as shown in the above] to realise the issue was with the notation

which seemed to be missing a density term or, in other words, be different from

What is surprising for me is that the integral

has a clear meaning as a Riemann integral, hence should be more intuitive….

Filed under: Books, R, Statistics, University life Tagged: accept-reject algorithm, George Casella, Introducing Monte Carlo Methods with R, Lebesgue integration, Riemann integration

### scalable Bayesian inference for the inverse temperature of a hidden Potts model

**M**att Moores, Tony Pettitt, and Kerrie Mengersen arXived a paper yesterday comparing different computational approaches to the processing of hidden Potts models and of the intractable normalising constant in the Potts model. This is a very interesting paper, first because it provides a comprehensive survey of the main methods used in handling this annoying normalising constant Z(β), namely pseudo-likelihood, the exchange algorithm, path sampling (a.k.a., thermal integration), and ABC. A massive simulation experiment with individual simulation times up to 400 hours leads to select path sampling (what else?!) as the (XL) method of choice. Thanks to a pre-computation of the expectation of the sufficient statistic E[S(Z)|β]. I just wonder why the same was not done for ABC, as in the recent Statistics and Computing paper we wrote with Matt and Kerrie. As it happens, I was actually discussing yesterday in Columbia of potential if huge improvements in processing Ising and Potts models by approximating first the distribution of S(X) for some or all β before launching ABC or the exchange algorithm. (In fact, this is a more generic desiderata for all ABC methods that simulating directly if approximately the summary statistics would being huge gains in computing time, thus possible in final precision.) Simulating the distribution of the summary and sufficient Potts statistic S(X) reduces to simulating this distribution with a null correlation, as exploited in Cucala and Marin (2013, JCGS, Special ICMS issue). However, there does not seem to be an efficient way to do so, i.e. without reverting to simulating the entire grid X…

Filed under: Books, R, Statistics, University life Tagged: ABC, Approximate Bayesian computation, Australia, Brisbane, exchange algorithm, Ising model, JCGS, path sampling, Potts model, pseudo-likelihood, QUT, Statistics and Computing

### hot X buns

**S**ince this is Easter weekend, and given my unreasonable fondness for hot-cross buns all year long, I tried to cook my own buns tonight, with a reasonable amount of success (!) given that it was my first attempt. I found an on-line recipe, mostly followed it, except that I added the yolk mixed with sugar to make the buns brown and shiny et voilà. If I ever try again to make those buns, I will look for an alternate way to make the [St. Andrew’s] crosses!

Filed under: Kids, pictures Tagged: cooking, Easter, Good Friday, hot-cross buns, Scotland, St. Andrew's cross

### back from New York

**A** greatly enjoyable [if a wee bit tight] visit to Columbia University for my seminar last Monday! (And a reasonably smooth trip if I forget about the screaming kids on both planes…!) Besides discussing with several faculty on our respective research interests, and explaining our views on replacing Bayes factors and posterior probabilities, views that were not strongly challenged by the seminar audience, maybe because it sounded too Bayesiano-Bayesian!, I had a great time catching up (well, almost!) with Andrew, running for one hour by the river both mornings, and even biking—does not feel worse than downtown Paris!—with Andrew a few miles to a terrific tiny Mexican restaurant in South Bronx, El Atoradero where I had a home-made tortilla (or *pupusa*) filled with beans and covered with hot chorizo! (The restaurant was selected as the 2014 best Mexican restaurant in New York City by The Village Voice, whatever that means. And also has a very supportive review in The New York Times.) It was so good I (very exceptionally) ordered a second serving of spicy pork *huarache*, which was almost as good. And kept me well-fed till the next day, when I arrived in Paris. And with enough calories to fight the cold melted snow that fell when biking back to the office at Columbia. I also had an interesting morning in a common room at Columbia, working next to graduate students and hearing their conversations about homeworks and advisors (nothing to gossip about as their comments were invariably laudatory!, maybe because they suspected me of being a mole!)

Filed under: Kids, pictures, Statistics, Travel, University life Tagged: Bronx, carnitas, Columbia University, El Atoradero, huarache, Hudson river, Manhattan, pupusa, The New York Times, The Village Voice

### New York skyline

Filed under: Kids, pictures, Travel, University life Tagged: Broadway, Columbia University, New York city, sunset

### True Detective [review]

**E**ven though I wrote before that I do not watch TV series, I made a second exception this year with *True Detective*. This series was recommended to me by Judith and this was truly a good recommendation!

Contrary to my old-fashioned idea of TV series, where the same group of caricaturesque characters repeatedly meet new settings that are solved within the 50 mn each show lasts, the whole season of *True Detective* is a single story, much more like a very long movie with a unified plot that smoothly unfolds and gets mostly solved in the last episode. It obviously brings more strength and depth in the characters, the two investigators Rust and Marty, with the side drawback that most of the other characters, except maybe Marty’s wife, get little space. The opposition between those two investigators is central to the coherence of the story, with Rust being the most intriguing one, very intellectual, almost otherworldly, with a nihilistic discourse, and a self-destructive bent, while Marty sounds more down-to-earth, although he also caters to his own self-destructive demons… Both actors are very impressive in giving a life and an history to their characters. The story takes place in Louisiana, with great landscapes and oppressive swamps where everything seems doomed to vanish, eventually, making detective work almost useless. And where clamminess applies to moral values as much as to the weather. The core of the plot is the search for a serial killer, whose murders of women are incorporated within a pagan cult. Although this sounds rather standard for a US murder story (!), and while there are unnecessary sub-plots and unconvincing developments, the overall storyboard is quite coherent, with a literary feel, even though its writer, Nic Pizzolatto, never completed the corresponding novel and the unfolding of the plot is anything but conventional, with well-done flashbacks and multi-layered takes on the same events. (With none of the subtlety of Rashômon, where one ends up mistrusting every POV.) Most of the series takes place in current time, when the two former detectives are interrogated by detectives reopening an unsolved murder case. The transformation of Rust over 15 years is an impressive piece of acting, worth by itself watching the show! The final episode, while impressive from an aesthetic perspective as a descent into darkness, is somewhat disappointing at the story level for not exploring the killer’s perspective much further and for resorting to a fairly conventional (in the Psycho sense!) fighting scene.

Filed under: Books, pictures Tagged: HBO, Louisiana, movie review, Nick Pizzolatto, Psycho, Rashomon, serial killer, True Detective, TV series

### by the Hudson river [#4]

### stability of noisy Metropolis-Hastings

**F**elipe Medina-Aguayo, Anthony Lee and Gareths Roberts, all from Warwick, arXived last Thursday a paper on the stability properties of noisy Metropolis-Hastings algorithms. The validation of unbiased estimators of the target à la Andrieu and Roberts (2009, AoS)—often discussed here—is in fact obvious when following the auxiliary variable representation of Andrieu and Vihola (2015, AoAP). Assuming the unbiased estimator of the target is generated conditional on the proposed value in the original Markov chain. The noisy version of the above means refreshing the unbiased estimator at each iteration. It also goes under the name of Monte Carlo within Metropolis. The difficulty with this noisy version is that it is not exact, i.e., does not enjoy the true target as its marginal stationary distribution. The paper by Medina-Aguayo, Lee and Roberts focusses on its validation or invalidation (with examples of transient noisy versions). Under geometric ergodicity of the marginal chain, plus some stability in the weights, the noisy version is also geometrically ergodic. A drift condition on the proposal kernel is also sufficient. Under (much?) harder conditions, the limiting distribution of the noisy chain is asymptotically in the number of unbiased estimators the true target. The result is thus quite interesting in that it provides sufficient convergence conditions, albeit not always easy to check in realistic settings.

Filed under: Books, Statistics, Travel, University life Tagged: geometric ergodicity, Monte Carlo within Metropolis, noisy Metropolis-Hastings algorithm, pseudo-marginal MCMC

### by the Hudson river [#3]

### the unbounded likelihood problem

**F**ollowing my maths of the Lindley-Jeffreys paradox post, Javier (from Warwick) pointed out a recent American Statistician paper by Liu, Wu and Meeker about Understanding and addressing the unbounded likelihood problem. (I remember meeting some of the authors when visiting Ames three years ago.) As often when reading articles in The American Statistician, I easily find reasons do disagree with the authors. Here are some.

*“Fisher (1912) suggest that a likelihood defined by a product of densities should be proportional to the probability of the data.”*

First, I fail to understand why an unbounded likelihood is an issue. (I also fail to understand the above quote: in a continuous setting, there is no such thing as *the probability of the data*. Only its density.) Especially when avoiding maximum likelihood estimation. The paper is quite vague as to why this is a statistical problem. They take as one category discrete mixture models. While the likelihood explodes around each observation (in the mean direction) this does not prevent the existence of convergent solutions to the likelihood equations. Or of Bayes estimators. Nested sampling itself manages this difficulty.

Second, I deeply dislike the baseline that everything is discrete or even finite, including measurement and hence continuous densities should be replaced with probabilities, called *correct likelihood* in the paper. Of course, using probabilities removes any danger of hitting an infinite likelihood. But it also introduces many layers of arbitrary calibration, incl. the scale of the discretisation. Like, I do not think there is any stability of the solution when the discretisation range Δ goes to zero, if the limiting theorem of the authors holds. But they do not seem to see this as an issue. I think it would make more sense to treat Δ as another parameter.

As an aside, I also find surprising the classification of the unbounded likelihood models in three categories, one being those “with three or four parameters, including a threshold parameter”. Why on Earth 3 or 4?! As if it was not possible to find infinite likelihoods with more than four parameters…

Filed under: Books, Statistics, Travel, University life

### by the Hudson river [#2]

### Le Monde puzzle [#905]

**A** recursive programming Le Monde mathematical puzzle:

*Given n tokens with 10≤n≤25, Alice and Bob play the following game: the first player draws an integer1≤m≤6 at random. This player can then take 1≤r≤min(2m,n) tokens. The next player is then free to take **1≤s≤min(2r,n-r) tokens. The player taking the last tokens is the winner. There is a winning strategy for Alice if she starts with m=3 and if Bob starts with m=2. Deduce the value of n. *

Although I first wrote a brute force version of the following code, a moderate amount of thinking leads to conclude that the person given n remaining token and an adversary choice of m tokens such that 2m≥n always win by taking the n remaining tokens:

optim=function(n,m){ outcome=(n<2*m+1) if (n>2*m){ for (i in 1:(2*m)) outcome=max(outcome,1-optim(n-i,i)) } return(outcome) }eliminating solutions which dividers are not solutions themselves:

sol=lowa=plura[plura<100] for (i in 3:6){ sli=plura[(plura>10^(i-1))&(plura<10^i)] ace=sli-10^(i-1)*(sli%/%10^(i-1)) lowa=sli[apply(outer(ace,lowa,FUN="=="), 1,max)==1] lowa=sort(unique(lowa)) sol=c(sol,lowa)}which leads to the output

> subs=rep(0,16) > for (n in 10:25) subs[n-9]=optim(n,3) > for (n in 10:25) if (subs[n-9]==1) subs[n-9]=1-optim(n,2) > subs [1] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 > (10:25)[subs==1] [1] 18Ergo, the number of tokens is 18!

Filed under: Books, Kids, R, Statistics, University life Tagged: Le Monde, mathematical puzzle, R, recursive function

### by the Hudson river

Filed under: pictures, Running, Travel, University life Tagged: Columbia University, Hudson river, New York city

### MCMskv, Lenzerheide, Jan. 5-7, 2016

**F**ollowing the highly successful* [authorised opinion!, from objective sources]* MCMski IV, in Chamonix last year, the BayesComp section of ISBA has decided in favour of a two-year period, which means the great item of news that next year we will meet again for MCMski V [or MCMskv for short], this time on the snowy slopes of the Swiss town of Lenzerheide, south of Zürich. The committees are headed by the indefatigable Antonietta Mira and Mark Girolami. The plenary speakers have already been contacted and Steve Scott (Google), Steve Fienberg (CMU), David Dunson (Duke), Krys Latuszynski (Warwick), and Tony Lelièvre (Mines, Paris), have agreed to talk. Similarly, the nine invited sessions have been selected and will include Hamiltonian Monte Carlo, Algorithms for Intractable Problems (ABC included!), Theory of (Ultra)High-Dimensional Bayesian Computation, Bayesian NonParametrics, Bayesian Econometrics, Quasi Monte Carlo, Statistics of Deep Learning, Uncertainty Quantification in Mathematical Models, and Biostatistics. There will be afternoon tutorials, including a practical session from the Stan team, tutorials for which call is open, poster sessions, a conference dinner at which we will be entertained by the unstoppable Imposteriors. The Richard Tweedie ski race is back as well, with a pair of Blossom skis for the winner!

Filed under: Kids, Mountains, pictures, R, Statistics, Travel, University life Tagged: ABC, BayesComp, Bayesian computation, Blossom skis, Chamonix, Glenlivet, Hamiltonian Monte Carlo, intractable likelihood, ISBA, MCMSki, MCMskv, Monte Carlo Statistical Methods, Richard Tweedie, ski town, STAN, Switzerland, Zurich

### also sprach Nietzsche

Filed under: Books, Kids, pictures Tagged: Andrew Gelman, atheism, Friedrich Nietzsche, graphical novel, Maximilien Le Roy, Michel Onfray, Philosophenweg

### intuition beyond a Beta property

**A** self-study question on X validated exposed an interesting property of the Beta distribution:

*If x is B(n,m) and y is B(n+½,m) then √xy is B(2n,2m)*

While this can presumably be established by a mere change of variables, I could not carry the derivation till the end and used instead the moment generating function E[(XY)s/2] since it naturally leads to ratios of B(a,b) functions and to nice cancellations thanks to the ½ in some Gamma functions [and this was the solution proposed on X validated]. However, I wonder at a more fundamental derivation of the property that would stem from a statistical reasoning… Trying with the ratio of Gamma random variables did not work. And the connection with order statistics does not apply because of the ½. Any idea?

Filed under: Books, Kids, R, Statistics, University life Tagged: beta distribution, cross validated, moment generating function, Stack Echange

### off to New York

**I** am off to New York City for two days, giving a seminar at Columbia tomorrow and visiting Andrew Gelman there. My talk will be about testing as mixture estimation, with slides similar to the Nice ones below if slightly upgraded and augmented during the flight to JFK. Looking at the past seminar speakers, I noticed we were three speakers from Paris in the last fortnight, with Ismael Castillo and Paul Doukhan (in the Applied Probability seminar) preceding me. Is there a significant bias there?!

Filed under: Books, pictures, Statistics, Travel, University life Tagged: Andrew Gelman, Bayesian hypothesis testing, Columbia University, finite mixtures, New York city, Nice, Paris, slides, SMILE seminar