## Bayesian News Feeds

### True Detective [review]

**E**ven though I wrote before that I do not watch TV series, I made a second exception this year with *True Detective*. This series was recommended to me by Judith and this was truly a good recommendation!

Contrary to my old-fashioned idea of TV series, where the same group of caricaturesque characters repeatedly meet new settings that are solved within the 50 mn each show lasts, the whole season of *True Detective* is a single story, much more like a very long movie with a unified plot that smoothly unfolds and gets mostly solved in the last episode. It obviously brings more strength and depth in the characters, the two investigators Rust and Marty, with the side drawback that most of the other characters, except maybe Marty’s wife, get little space. The opposition between those two investigators is central to the coherence of the story, with Rust being the most intriguing one, very intellectual, almost otherworldly, with a nihilistic discourse, and a self-destructive bent, while Marty sounds more down-to-earth, although he also caters to his own self-destructive demons… Both actors are very impressive in giving a life and an history to their characters. The story takes place in Louisiana, with great landscapes and oppressive swamps where everything seems doomed to vanish, eventually, making detective work almost useless. And where clamminess applies to moral values as much as to the weather. The core of the plot is the search for a serial killer, whose murders of women are incorporated within a pagan cult. Although this sounds rather standard for a US murder story (!), and while there are unnecessary sub-plots and unconvincing developments, the overall storyboard is quite coherent, with a literary feel, even though its writer, Nic Pizzolatto, never completed the corresponding novel and the unfolding of the plot is anything but conventional, with well-done flashbacks and multi-layered takes on the same events. (With none of the subtlety of Rashômon, where one ends up mistrusting every POV.) Most of the series takes place in current time, when the two former detectives are interrogated by detectives reopening an unsolved murder case. The transformation of Rust over 15 years is an impressive piece of acting, worth by itself watching the show! The final episode, while impressive from an aesthetic perspective as a descent into darkness, is somewhat disappointing at the story level for not exploring the killer’s perspective much further and for resorting to a fairly conventional (in the Psycho sense!) fighting scene.

Filed under: Books, pictures Tagged: HBO, Louisiana, movie review, Nick Pizzolatto, Psycho, Rashomon, serial killer, True Detective, TV series

### by the Hudson river [#4]

### stability of noisy Metropolis-Hastings

**F**elipe Medina-Aguayo, Anthony Lee and Gareths Roberts, all from Warwick, arXived last Thursday a paper on the stability properties of noisy Metropolis-Hastings algorithms. The validation of unbiased estimators of the target à la Andrieu and Roberts (2009, AoS)—often discussed here—is in fact obvious when following the auxiliary variable representation of Andrieu and Vihola (2015, AoAP). Assuming the unbiased estimator of the target is generated conditional on the proposed value in the original Markov chain. The noisy version of the above means refreshing the unbiased estimator at each iteration. It also goes under the name of Monte Carlo within Metropolis. The difficulty with this noisy version is that it is not exact, i.e., does not enjoy the true target as its marginal stationary distribution. The paper by Medina-Aguayo, Lee and Roberts focusses on its validation or invalidation (with examples of transient noisy versions). Under geometric ergodicity of the marginal chain, plus some stability in the weights, the noisy version is also geometrically ergodic. A drift condition on the proposal kernel is also sufficient. Under (much?) harder conditions, the limiting distribution of the noisy chain is asymptotically in the number of unbiased estimators the true target. The result is thus quite interesting in that it provides sufficient convergence conditions, albeit not always easy to check in realistic settings.

Filed under: Books, Statistics, Travel, University life Tagged: geometric ergodicity, Monte Carlo within Metropolis, noisy Metropolis-Hastings algorithm, pseudo-marginal MCMC

### by the Hudson river [#3]

### the unbounded likelihood problem

**F**ollowing my maths of the Lindley-Jeffreys paradox post, Javier (from Warwick) pointed out a recent American Statistician paper by Liu, Wu and Meeker about Understanding and addressing the unbounded likelihood problem. (I remember meeting some of the authors when visiting Ames three years ago.) As often when reading articles in The American Statistician, I easily find reasons do disagree with the authors. Here are some.

*“Fisher (1912) suggest that a likelihood defined by a product of densities should be proportional to the probability of the data.”*

First, I fail to understand why an unbounded likelihood is an issue. (I also fail to understand the above quote: in a continuous setting, there is no such thing as *the probability of the data*. Only its density.) Especially when avoiding maximum likelihood estimation. The paper is quite vague as to why this is a statistical problem. They take as one category discrete mixture models. While the likelihood explodes around each observation (in the mean direction) this does not prevent the existence of convergent solutions to the likelihood equations. Or of Bayes estimators. Nested sampling itself manages this difficulty.

Second, I deeply dislike the baseline that everything is discrete or even finite, including measurement and hence continuous densities should be replaced with probabilities, called *correct likelihood* in the paper. Of course, using probabilities removes any danger of hitting an infinite likelihood. But it also introduces many layers of arbitrary calibration, incl. the scale of the discretisation. Like, I do not think there is any stability of the solution when the discretisation range Δ goes to zero, if the limiting theorem of the authors holds. But they do not seem to see this as an issue. I think it would make more sense to treat Δ as another parameter.

As an aside, I also find surprising the classification of the unbounded likelihood models in three categories, one being those “with three or four parameters, including a threshold parameter”. Why on Earth 3 or 4?! As if it was not possible to find infinite likelihoods with more than four parameters…

Filed under: Books, Statistics, Travel, University life

### by the Hudson river [#2]

### Le Monde puzzle [#905]

**A** recursive programming Le Monde mathematical puzzle:

*Given n tokens with 10≤n≤25, Alice and Bob play the following game: the first player draws an integer1≤m≤6 at random. This player can then take 1≤r≤min(2m,n) tokens. The next player is then free to take **1≤s≤min(2r,n-r) tokens. The player taking the last tokens is the winner. There is a winning strategy for Alice if she starts with m=3 and if Bob starts with m=2. Deduce the value of n. *

Although I first wrote a brute force version of the following code, a moderate amount of thinking leads to conclude that the person given n remaining token and an adversary choice of m tokens such that 2m≥n always win by taking the n remaining tokens:

optim=function(n,m){ outcome=(n<2*m+1) if (n>2*m){ for (i in 1:(2*m)) outcome=max(outcome,1-optim(n-i,i)) } return(outcome) }eliminating solutions which dividers are not solutions themselves:

sol=lowa=plura[plura<100] for (i in 3:6){ sli=plura[(plura>10^(i-1))&(plura<10^i)] ace=sli-10^(i-1)*(sli%/%10^(i-1)) lowa=sli[apply(outer(ace,lowa,FUN="=="), 1,max)==1] lowa=sort(unique(lowa)) sol=c(sol,lowa)}which leads to the output

> subs=rep(0,16) > for (n in 10:25) subs[n-9]=optim(n,3) > for (n in 10:25) if (subs[n-9]==1) subs[n-9]=1-optim(n,2) > subs [1] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 > (10:25)[subs==1] [1] 18Ergo, the number of tokens is 18!

Filed under: Books, Kids, R, Statistics, University life Tagged: Le Monde, mathematical puzzle, R, recursive function

### by the Hudson river

Filed under: pictures, Running, Travel, University life Tagged: Columbia University, Hudson river, New York city

### MCMskv, Lenzerheide, Jan. 5-7, 2016

**F**ollowing the highly successful* [authorised opinion!, from objective sources]* MCMski IV, in Chamonix last year, the BayesComp section of ISBA has decided in favour of a two-year period, which means the great item of news that next year we will meet again for MCMski V [or MCMskv for short], this time on the snowy slopes of the Swiss town of Lenzerheide, south of Zürich. The committees are headed by the indefatigable Antonietta Mira and Mark Girolami. The plenary speakers have already been contacted and Steve Scott (Google), Steve Fienberg (CMU), David Dunson (Duke), Krys Latuszynski (Warwick), and Tony Lelièvre (Mines, Paris), have agreed to talk. Similarly, the nine invited sessions have been selected and will include Hamiltonian Monte Carlo, Algorithms for Intractable Problems (ABC included!), Theory of (Ultra)High-Dimensional Bayesian Computation, Bayesian NonParametrics, Bayesian Econometrics, Quasi Monte Carlo, Statistics of Deep Learning, Uncertainty Quantification in Mathematical Models, and Biostatistics. There will be afternoon tutorials, including a practical session from the Stan team, tutorials for which call is open, poster sessions, a conference dinner at which we will be entertained by the unstoppable Imposteriors. The Richard Tweedie ski race is back as well, with a pair of Blossom skis for the winner!

Filed under: Kids, Mountains, pictures, R, Statistics, Travel, University life Tagged: ABC, BayesComp, Bayesian computation, Blossom skis, Chamonix, Glenlivet, Hamiltonian Monte Carlo, intractable likelihood, ISBA, MCMSki, MCMskv, Monte Carlo Statistical Methods, Richard Tweedie, ski town, STAN, Switzerland, Zurich

### also sprach Nietzsche

Filed under: Books, Kids, pictures Tagged: Andrew Gelman, atheism, Friedrich Nietzsche, graphical novel, Maximilien Le Roy, Michel Onfray, Philosophenweg

### intuition beyond a Beta property

**A** self-study question on X validated exposed an interesting property of the Beta distribution:

*If x is B(n,m) and y is B(n+½,m) then √xy is B(2n,2m)*

While this can presumably be established by a mere change of variables, I could not carry the derivation till the end and used instead the moment generating function E[(XY)s/2] since it naturally leads to ratios of B(a,b) functions and to nice cancellations thanks to the ½ in some Gamma functions [and this was the solution proposed on X validated]. However, I wonder at a more fundamental derivation of the property that would stem from a statistical reasoning… Trying with the ratio of Gamma random variables did not work. And the connection with order statistics does not apply because of the ½. Any idea?

Filed under: Books, Kids, R, Statistics, University life Tagged: beta distribution, cross validated, moment generating function, Stack Echange

### off to New York

**I** am off to New York City for two days, giving a seminar at Columbia tomorrow and visiting Andrew Gelman there. My talk will be about testing as mixture estimation, with slides similar to the Nice ones below if slightly upgraded and augmented during the flight to JFK. Looking at the past seminar speakers, I noticed we were three speakers from Paris in the last fortnight, with Ismael Castillo and Paul Doukhan (in the Applied Probability seminar) preceding me. Is there a significant bias there?!

Filed under: Books, pictures, Statistics, Travel, University life Tagged: Andrew Gelman, Bayesian hypothesis testing, Columbia University, finite mixtures, New York city, Nice, Paris, slides, SMILE seminar

### a most curious case of misaddressed mail

**T**oday, I got two FedEx envelopes in the mail, both apparently from the same origin, namely UF Statistics department reimbursing my travel expenses. However, once both envelopes opened, I discovered that, while one was indeed containing my reimbursement cheque, the other one contained several huge cheques addressed to… a famous Nova Scotia fiddler, Natalie MacMaster, for concerts she gave recently in South East US, and with no possible connection with either me or the stats department! So I have no idea how those cheques came to me (before I returned them to their rightful recipient in Nova Scotia!). Complete mystery! The only possible link is that I just found Natalie MacMaster and her band played in Gainesville two weeks ago. Hence a potential scenario: at the local FedEx sorting centre, the envelope intended for Natalie MacMaster lost its label and someone took the second label from my then nearby envelope to avoid dealing with the issue… In any case, this gave me the opportunity to listen to pretty enticing Scottish music!

Filed under: Books, Travel, University life Tagged: Cape Breton, FedEx, fiddle, Gainesville, Irish music, Naatalie MacMaster, Nova Scotia, Scotland, University of Florida

### likelihood-free model choice

**J**ean-Michel Marin, Pierre Pudlo and I just arXived a short review on ABC model choice, first version of a chapter for the incoming *Handbook of Approximate Bayesian computation* edited by Scott Sisson, Yannan Fan, and Mark Beaumont. Except for a new analysis of a Human evolution scenario, this survey mostly argues for the proposal made in our recent paper on the use of random forests and [also argues] about the lack of reliable approximations to posterior probabilities. (Paper that was rejected by PNAS and that is about to be resubmitted. Hopefully with a more positive outcome.) The conclusion of the survey is that

*The presumably most pessimistic conclusion of this study is that the connections between (i) the true posterior probability of a model, (ii) the ABC version of this probability, and (iii) the random forest version of the above, are at best very loose. This leaves open queries for acceptable approximations of (i), since the posterior predictive error is instead an error assessment for the ABC RF model choice procedure. While a Bayesian quantity that can be computed at little extra cost, it does not necessarily compete with the posterior probability of a model.*

reflecting my hope that we can eventually come up with a proper approximation to the “true” posterior probability…

Filed under: Books, pictures, Statistics, University life, Wines Tagged: ABC, ABC model choice, Handbook of Approximate Bayesian computation, likelihood-free methods, Montpellier, PNAS, random forests, survey

### importance weighting without importance weights [ABC for bandits?!]

**I** did not read very far in the recent arXival by Neu and Bartók, but I got the impression that it was a version of ABC for bandit problems where the probabilities behind the bandit arms are not available but can be generated. Since the stopping rule found in the “Recurrence weighting for multi-armed bandits” is the generation of an arm equal to the learner’s draw (p.5). Since there is no tolerance there, the method is exact (“unbiased”). As no reference is made to the ABC literature, this may be after all a mere analogy…

Filed under: Books, Statistics, University life Tagged: ABC, machine learning, multi-armed bandits, tolerance, Zurich

### the maths of Jeffreys-Lindley paradox

**C**ristiano Villa and Stephen Walker arXived on last Friday a paper entitled On the mathematics of the Jeffreys-Lindley paradox. Following the philosophical papers of last year, by Ari Spanos, Jan Sprenger, Guillaume Rochefort-Maranda, and myself, this provides a more statistical view on the paradox. Or “paradox”… Even though I strongly disagree with the conclusion, namely that a finite (prior) variance σ² should be used in the Gaussian prior. And fall back on classical Type I and Type II errors. So, in that sense, the authors avoid the Jeffreys-Lindley paradox altogether!

The argument against considering a limiting value for the posterior probability is that it converges to 0, 21, or an intermediate value. In the first two cases it is useless. In the medium case. achieved when the prior probability of the null and alternative hypotheses depend on variance σ². While I do not want to argue in favour of my 1993 solution

since it is ill-defined in measure theoretic terms, I do not buy the coherence argument that, since this prior probability converges to zero when σ² goes to infinity, the posterior probability should also go to zero. In the limit, probabilistic reasoning fails since the prior under the alternative is a measure not a probability distribution… We should thus abstain from over-interpreting improper priors. (A sin sometimes committed by Jeffreys himself in his book!)

Filed under: Books, Kids, Statistics Tagged: Bayesian tests of hypotheses, Capitaine Haddock, Dennis Lindley, Harold Jeffreys, improper priors, Jeffreys-Lindley paradox, model posterior probabilities, Tintin

### Le Monde puzzle [#904.5]

**A**bout this #904 arithmetics Le Monde mathematical puzzle:

*Find all plural **integers, namely positive** integers such that (a) none of their digits is zero and (b) removing their leftmost digit produces a dividing plural integer (with the convention that one digit integers are all plural)**.
*

a slight modification in the R code allows for a faster exploration, based on the fact that solutions add one extra digit to solutions with one less digit:

First, I found this function on Stack Overflow to turn an integer into its digits:

pluri=plura=NULL #solutions with two digits for (i in 11:99){ dive=rev(digin(i)[-1]) if (min(dive)>0){ dive=sum(dive*10^(0:(length(dive)-1))) if (i==((i%/%dive)*dive)) pluri=c(pluri,i)}} for (n in 2:6){ #number of digits plura=c(plura,pluri) pluro=NULL for (j in pluri){ for (k in (1:9)*10^n){ x=k+j if (x==(x%/%j)*j) pluro=c(pluro,x)} } pluri=pluro}which leads to the same output

> sort(plura) [1] 11 12 15 21 22 24 25 31 32 33 35 36 [13] 41 42 44 45 48 51 52 55 61 62 63 64 [25] 65 66 71 72 75 77 81 82 84 85 88 91 [37] 92 93 95 96 99 125 225 312 315 325 375 425 [49] 525 612 615 624 625 675 725 735 825 832 912 [61] 915 925 936 945 975 1125 2125 3125 3375 4125 [70] 5125 5625 [72] 6125 6375 7125 8125 9125 9225 9375 53125 [80] 91125 95625Filed under: Books, Kids, R, Statistics, University life Tagged: arithmetics, Le Monde, mathematical puzzle, strsplit()

### Le Monde puzzle [#904]

**A**n arithmetics Le Monde mathematical puzzle:

*Find all plural **integers, namely positive** integers such that (a) none of their digits is zero and (b) removing their leftmost digit produces a dividing plural integer (with the convention that one digit integers are all plural)**.
*

An easy arithmetic puzzle, with no real need for an R code since it is straightforward to deduce the solutions. Still, to keep up with tradition, here it is!

First, I found this function on Stack Overflow to turn an integer into its digits:

digin=function(n){ as.numeric(strsplit(as.character(n),"")[[1]])}then I simply checked all integers up to 10⁶:

plura=NULL for (i in 11:10^6){ dive=rev(digin(i)[-1]) if (min(dive)>0){ dive=sum(dive*10^(0:(length(dive)-1))) if (i==((i%/%dive)*dive)) plura=c(plura,i)}}eliminating solutions which dividers are not solutions themselves:

sol=lowa=plura[plura<100] for (i in 3:6){ sli=plura[(plura>10^(i-1))&(plura<10^i)] ace=sli-10^(i-1)*(sli%/%10^(i-1)) lowa=sli[apply(outer(ace,lowa,FUN="=="), 1,max)==1] lowa=sort(unique(lowa)) sol=c(sol,lowa)}which leads to the output

> sol [1] 11 12 15 21 22 24 25 31 32 33 35 36 [13] 41 42 44 45 48 51 52 55 61 62 63 64 [25] 65 66 71 72 75 77 81 82 84 85 88 91 [37] 92 93 95 96 99 125 225 312 315 325 375 425 [49] 525 612 615 624 625 675 725 735 825 832 912 [61] 915 925 936 945 975 1125 2125 3125 3375 4125 [70] 5125 5625 [72] 6125 6375 7125 8125 9125 9225 9375 53125 [80] 91125 95625leading to the conclusion there is no solution beyond 95625.

Filed under: Books, Kids, Statistics, University life Tagged: Le Monde, mathematical puzzle, strsplit()

### light and widely applicable MCMC: approximate Bayesian inference for large datasets

**F**lorian Maire (whose thesis was discussed in this post), Nial Friel, and Pierre Alquier (all in Dublin at some point) have arXived today a paper with the above title, aimed at quickly analysing large datasets. As reviewed in the early pages of the paper, this proposal follows a growing number of techniques advanced in the past years, like pseudo-marginals, Russian roulette, unbiased likelihood estimators. firefly Monte Carlo, adaptive subsampling, sub-likelihoods, telescoping debiased likelihood version, and even our very own delayed acceptance algorithm. (Which is incorrectly described as restricted to iid data, by the way!)

The lightweight approach is based on an ABC idea of working through a summary statistic that plays the role of a pseudo-sufficient statistic. The main theoretical result in the paper is indeed that, when subsampling in an exponential family, subsamples preserving the sufficient statistics (modulo a rescaling) are optimal in terms of distance to the true posterior. Subsamples are thus weighted in terms of the (transformed) difference between the full data statistic and the subsample statistic, assuming they are both normalised to be comparable. I am quite (positively) intrigued by this idea in that it allows to somewhat compare inference based on two different samples. The weights of the subsets are then used in a pseudo-posterior that treats the subset as an auxiliary variable (and the weight as a substitute to the “missing” likelihood). This may sound a wee bit convoluted (!) but the algorithm description is not yet complete: simulating jointly from this pseudo-target is impossible because of the huge number of possible subsets. The authors thus suggest to run an MCMC scheme targeting this joint distribution, with a proposed move on the set of subsets and a proposed move on the parameter set conditional on whether or not the proposed subset has been accepted.

From an ABC perspective, the difficulty in calibrating the tolerance ε sounds more accute than usual, as the size of the subset comes as an additional computing parameter. Bootstrapping options seem impossible to implement in a large size setting.

An MCMC issue with this proposal is that designing the move across the subset space is both paramount for its convergence properties and lacking in geometric intuition. Indeed, two subsets with similar summary statistics may be very far apart… Funny enough, in the representation of the joint Markov chain, the parameter subchain is secondary if crucial to avoid intractable normalising constants. It is also unclear for me from reading the paper maybe too quickly whether or not the separate moves when switching and when not switching subsets retain the proper balance condition for the pseudo-joint to still be the stationary distribution. The stationarity for the subset Markov chain is straightforward by design, but it is not so for the parameter. In case of switched subset, simulating from the true full conditional given the subset would work, but not simulated by a fixed number L of MCMC steps.

The lightweight technology therein shows its muscles on an handwritten digit recognition example where it beats regular MCMC by a factor of 10 to 20, using only 100 datapoints instead of the 10⁴ original datapoints. While very nice and realistic, this example may be misleading in that 100 digit realisations may be enough to find a tolerable approximation to the true MAP. I was also intrigued by the processing of the probit example, until I realised the authors had integrated the covariate out and inferred about the mean of that covariate, which means it is not a genuine probit model.

Filed under: Books, Statistics, University life, Wines Tagged: ABC, big data, character recognition, delayed acceptance, Dublin, Ireland, Markov chains, MCMC algorithm, reversible jump MCMC, Russian roulette, subsampling

### ABC for copula estimation

Clara Grazian and Brunero Liseo (di Roma) have just arXived a note on a method merging copulas, ABC, and empirical likelihood. The approach is rather hybrid and thus not completely Bayesian, but this must be seen as a consequence of an ill-posed problem. Indeed, as in many econometric models, the model there is not fully defined: the marginals of iid observations are represented as being from well-known parametric families (and are thus well-estimated by Bayesian tools), while the joint distribution remains uncertain and hence so does the associated copula. The approach in the paper is to proceed stepwise, i.e., to estimate correctly each marginal, well correctly enough to transform the data by an estimated cdf, and then only to estimate the copula or some aspect of it based on this transformed data. Like Spearman’s ρ. For which an empirical likelihood is computed and aggregated to a prior to make a BCel weight. (If this sounds unclear, each BEel evaluation is based on a random draw from the posterior samples, which transfers some uncertainty in the parameter evaluation into the copula domain. Thanks to Brunero and Clara for clarifying this point for me!)

At this stage of the note, there are two illustrations revolving around Spearman’s ρ. One on simulated data, with better performances than a nonparametric frequentist solution. And another one on a Garch (1,1) model for two financial time-series.

I am quite glad to see an application of our BCel approach in another domain although I feel a tiny bit uncertain about the degree of arbitrariness in the approach, from the estimated cdf transforms of the marginals to the choice of the moment equations identifying the parameter of interest like Spearman’s ρ. Especially if one uses a parametric copula which moments are equally well-known. While I see the practical gain in analysing each component separately, the object created by the estimated cdf transforms may have a very different correlation structure from the true cdf transforms. Maybe there exist consistency conditions on the estimated cdfs… Maybe other notions of orthogonality or independence could be brought into the picture to validate further the two-step solution…

Filed under: Books, Kids, pictures, Statistics, Travel, University life Tagged: ABC, copula, empirical likelihood, GARCH model, Italia, La Sapienza, Roma, Spearman's ρ