## Bayesian Bloggers

### thick disc formation scenario of the Milky Way evaluated by ABC

*“The facts that the thick-disc episode lasted for several billion years, that a contraction is observed during the collapse phase, and that the main thick disc has a constant scale height with no flare argue against the formation of the thick disc through radial migration. The most probable scenario for the thick disc is that it formed while the Galaxy was gravitationally collapsing from well-mixed gas-rich giant clumps that were sustained by high turbulence, which prevented a thin disc from forming for a time, as proposed previously.”*

**F**ollowing discussions with astronomers from Besancon on the use of ABC methods to approximate posteriors, I was associated with their paper on assessing a formation scenario of the Milky Way, which was accepted a few weeks ago in Astronomy & Astrophysics. The central problem (*was there a thin-then-thick disk?*) somewhat escapes me, but this collaboration started when some of the astronomers leading the study contacted me about convergence issues with their MCMC algorithms and I realised they were using ABC-MCMC without any idea that it was in fact called ABC-MCMC and had been studied previously in another corner of the literature… The scale in the kernel was chosen to achieve an average acceptance rate of 5%-10%. Model are then compared by the combination of a log-likelihood approximation resulting from the ABC modelling and of a BIC ranking of the models. (Incidentally, I was impressed at the number of papers published in Astronomy & Astrophysics. The monthly issue contains dozens of papers!)

Filed under: Statistics, University life Tagged: ABC, ABC-MCMC, astronomy, astrostatistics, BIC, Milky Way, thick disk, thin disk

### posterior predictive checks for admixture models

**I**n a posting coincidence, just a few days after we arXived our paper on ABC model choice with random forests, where we use posterior predictive errors for assessing the variability of the random forest procedure, David Mimno, David Blei, and Barbara Engelhardt arXived a paper on posterior predictive checks to quantify lack-of-fit in admixture models of latent population structure, which deals with similar data and models, while also using the posterior predictive as a central tool. (*Marginalia:* the paper is a wee bit difficult to read [esp. with France-Germany playing in the airport bar!] as the modelling is only clearly described at the very end. I suspect this arXived version was put together out of a submission to a journal like Nature or PNAS, with mentions of a Methods section that does not appear here and of Supplementary Material that turned into subsections of the Discussion section.)

**T**he dataset are genomic datasets made of SNPs (single nucleotide polymorphisms). For instance, the first (HapMap) dataset corresponds to 1,043 individuals and 468,167 SNPs. The model is simpler than Kingman’s coalescent, hence its likelihood does not require ABC steps to run inference. The admixture model in the paper is essentially a mixture model over ancestry indices with individual dependent weights with Bernoulli observations, hence resulting into a completed likelihood of the form

(which looks more formidable than it truly is!). Regular Bayesian inference is thus possible in this setting, implementing e.g. Gibbs sampling. The authors chose instead to rely on EM and thus derived the maximum likelihood estimators of the (many) parameters of the admixture. And of the latent variables **z**. Their posterior predictive check is based on the simulation of pseudo-observations (as in ABC!) from the above likelihood, with parameters and latent variables replaced with their EM estimates (unlike ABC). There is obviously some computational reason in doing this instead of simulating from the posterior, albeit implicit in the paper. I am however slightly puzzled by the conditioning on the latent variable estimate **ẑ**, as its simulation is straightforward and as a latent variable is more a missing observation than a parameter. Given those 30 to 100 replications of the data, an empirical distribution of a discrepancy function is used to assess whether or not the equivalent discrepancy for the observation is an outlier. If so, the model is not appropriate for the data. (Interestingly, the discrepancy is measured via the Bayes factor of z-scores.)

**T**he connection with our own work is that the construction of discrepancy measures proposed in this paper could be added to our already large collection of summary statistics to check to potential impact in model comparison, i.e. for a significant contribution to the random forest nodes. Conversely, the most significant summary statistics could then be tested as discrepancy measures. Or, more in tune with our Series B paper on the proper selection of summary variables, the distribution of those discrepancy measures could be compared across potential models. Assuming this does not take too much computing power…

Filed under: pictures, Statistics, Travel, University life Tagged: ABC model choice, arXiv, Bernoulli, goodness of fit, Human Genomics, Kingman's coalescent, mixture estimation, SNPs

### ABC [almost] in the front news

**M**y friend and Warwick colleague Gareth Roberts just published a paper in Nature with Ellen Brooks-Pollock and Matt Keeling from the University of Warwick on the modelling of bovine tuberculosis dynamics in Britain and on the impact of control measures. The data comes from the Cattle Tracing System and the VetNet national testing database. The mathematical model is based on a stochastic process and its six parameters are estimated by sequential ABC (SMC-ABC). The summary statistics chosen in the model are the number of infected farms per county per year and the number of reactors (cattle failing a test) per county per year.

*“Therefore, we predict that control of local badger populations and hence control of environmental transmission will have a relatively limited effect on all measures of bovine TB incidence.”*

**T**his advanced modelling of a comprehensive dataset on TB in Britain quickly got into a high profile as it addresses the highly controversial (not to say plain stupid) culling of badgers (who also carry TB) advocated by the government. The study concludes that “only generic measures such as more national testing, whole herd culling or vaccination that affect all routes of transmission are effective at controlling the spread of bovine TB.” While the elimination of badgers from the English countryside would have a limited effect. Good news for badgers! And the Badger Trust. Unsurprisingly, the study was immediately rejected by the UK farming minister! Not only does he object to the herd culling solution for economic reasons, but he “cannot accept the paper’s findings”. Maybe he does not like ABC… More seriously, the media oversimplified the findings of the study, “as usual”, with e.g. The Guardian headline of “tuberculosis threat requires mass cull of cattle”.

Filed under: pictures, Statistics, University life Tagged: ABC, Badger Trust, badgers, Britain, cattle, cows, England, epidemiology, media, SMC-ABC, summary statistics, TB, The Guardian, tuberculosis, University of Warwick

### how to translate evidence into French?

**I** got this email from Gauvain who writes a PhD in philosophy of sciences a few minutes ago:

L’auteur du texte que j’ai à traduire désigne les facteurs de Bayes comme une “Bayesian measure of evidence”, et les tests de p-value comme une “frequentist measure of evidence”. Je me demandais s’il existait une traduction française reconnue et établie pour cette expression de “measure of evidence”. J’ai rencontré parfois “mesure d’évidence” qui ressemble fort à un anglicisme, et parfois “estimateur de preuve”, mais qui me semble pouvoir mener à des confusions avec d’autres emploi du terme “estimateur”.

which (pardon my French!) wonders how to translate the term *evidence* into French. It would sound natural that the French *évidence* is the answer but this is not the case. Despite sharing the same Latin root (*evidentia*), since the English version comes from medieval French, the two words have different meanings: in English, it means a collection of facts coming to support an assumption or a theory, while in French it means something obvious, which truth is immediately perceived. Surprisingly, English kept the adjective *evident* with the same [obvious] meaning as the French *évident*. But the noun moved towards a much less definitive meaning, both in Law and in Science. I had never thought of the huge gap between the two meanings but must have been surprised at its use the first time I heard it in English. But does not think about it any longer, as when I reviewed Seber’s Evidence and Evolution.

**O**ne may wonder at the best possible translation of evidence into French. Even though marginal likelihood (vraisemblance marginale) is just fine for statistical purposes. I would suggest *faisceau de présomptions* or *degré de soutien* or yet *intensité de soupçon* as (lengthy) solutions. *Soupçon* could work as such, but has a fairly negative ring…

Filed under: Books, Statistics, University life Tagged: Bayes factors, evidence, Evidence and Evolution, French vs. English, obvious, translation

### voices – strange shores [book reviews]

**F**ollowing my recent trip to Iceland, I read two more books by Arnaldur Indriðason, *Voices* (*Röddin*, 2003) and *Strange Shores* (*Furðustrandir*, 2010).

**A**s usual, Indriðason’s books are more about the past (of characters as well as of the whole country) than about current times. *Voices* does not switch from this pattern, the more because it is one of the earliest Inspector Erlendur’s books. Besides the murder of an hotel employee at the fringe of homelessness, lies the almost constant questioning in Indriðason’s books of the difficult or even impossible relations between parents and children and/or between siblings, and of the long-lasting consequences of this generation gap. The murder iitself is but a pretext to investigations on that theme and the murder resolution is far from the central point of the book. The story itself is thus less compelling than others I have read, maybe because the main character spends so much time closeted in his hotel room. But it nonetheless fits well within the Erlendur series. And although it is unrelated with the story, the cover reminded me very much of the Gullfoss waterfalls.

**T**he second book, *Strange Shores*, is the farthest to a detective stories in the whole series. Indeed, Erlendur is back to his childhood cottage in Eastern Iceland, looking for a resolution of his childhood trauma, loosing his younger brother during a snowstorm. He also investigates another snowstorm disappearance, interrogating the few survivors and reluctant witnesses from that time. Outside any legal mandate. Sometimes very much outside! While the story is not completely plausible, both in the present and in the past, it remains a striking novel, even on its own. (Although it could read better after the earlier novels in the series.) Not only the resolution of the additional disappearance brings additional pain and no comfort to those involved, but the ending of Erlendur’s own quest is quite ambiguous. As the book reaches its final pages, I could not decide if he had reached redemption and deliverance and the potential to save his own children, or he was beyond redemption, reaching another circle of Hell. As explained by the author in an interview, this is intentional and not not the consequence of my poor understanding: ” Readers of *Strange Shores* are not quite certain what to make of the ending regarding Erlendur, and I’m quite happy to leave them in the dark!”. If the main character of this series focussing more on missing persons than on detective work, what’s next?!

Filed under: Books, Mountains, Travel Tagged: Arnaldur Indriðason, Furðustrandir, Iceland, Iceland noir, mountain storm, Röddin, Reykjavik

### sunrise over Warwickshire (#2)

Filed under: pictures, Running, Travel, University life Tagged: England, pond, summer, sunrise, University of Warwick, Warw

### vector quantile regression

**M**y Paris-Dauphine colleague Guillaume Carlier recently arXived a statistics paper entitled *Vector quantile regression*, co-written with Chernozhukov and Galichon. I was most curious to read the paper as Guillaume is primarily a mathematical analyst working on optimisation problems like optimal transport. And also because I find quantile regression difficult to fathom as a statistical problem. (As it happens, both his co-authors are from econometrics.) The results in the paper are (i) to show that a d-dimensional (Lebesgue) absolutely continuous random variable Y can always be represented as the deterministic transform Y=Q(U), where U is a d-dimensional [0,1] uniform (the paper expresses this transform as conditional on a set of regressors Z, but those essentially play no role) and Q is monotonous in the sense of being the gradient of a convex function,

and

(ii) to deduce from this representation a unique notion of multivariate quantile function; and (iii) to consider the special case when the quantile function Q can be written as the linear

where β(U) is a matrix. Hence leading to an estimation problem.

**W**hile unsurprising from a measure theoretic viewpoint, the representation theorem (i) is most interesting both for statistical and simulation reasons. Provided the function Q can be easily estimated and derived, respectively. The paper however does not provide a constructive tool for this derivation, besides indicating several characterisations as solutions of optimisation problems. From a statistical perspective, a non-parametric estimation of β(.) would have useful implications in multivariate regression, although the paper only considers the specific linear case above. Which solution is obtained by a discretisation of all variables and linear programming.

Filed under: pictures, Statistics, University life Tagged: CEREMADE, Monte Carlo methods, multivariate quantile, nonparametrics, Paris, quantile regression, simulation, Université Paris Dauphine

### sunset over Warwickshire

Filed under: pictures, Running, Travel, University life Tagged: countryside, England, road running, sunset, trees, Warwickshire

### straightforward statistics [book review]

*“I took two different statistics courses as an undergraduate psychology major [and] four different advanced statistics classes as a PhD student.”* G. Geher

*Straightforward Statistics: Understanding the Tools of Research* by Glenn Geher and Sara Hall is an introductory textbook for psychology and other social science students. (That Oxford University Press sent me for review. Nice cover, by the way!) I can spot the purpose behind the title, purpose heavily stressed anew in the preface and the first chapter, but it nonetheless irks me as conveying the message that one semester of reasonable diligence in class will suffice to any college students to *“not only understanding research findings from psychology, but also to uncovering new truths about the world and our place in it”* (p.9). Nothing less. While, in essence, it covers the basics found in all introductory textbooks, from descriptive statistics to ANOVA models. The inclusion of “real research examples” in the chapters of the book rather demonstrates how far from real research a reader of the book would stand…

*“However, as often happen in life, we can be wrong…” (p.66)*

**T**he book aims at teaching basic statistics to “undergraduate students who are afraid of math” (p.xiii). By using “an accessible, accurate, coherent, and engaging presentation of statistics” (p.xiv). And reducing the maths expressions to a bare minimum. Unfortunately the very first formula (p.19) is meaningless (skipping the individual indices in sums is the rule throughout the book)

and the second one (Table 2.7, p.22 and again Tables 2.19 and 2.20, p.43)

is (a) missing both the indices and the summation symbol and (b) dividing the sum of “squared deviation scores” by N rather than the customary N-1. I also fail to see the point of providing histograms for categorical variables with only two modalities, like “Hungry” and “Not Hungry” (Fig. 2.11, p.47)…

*“Statisticians never prove anything-thereby making prove something of a dirty word.” (p.116)*

**A**s I only teach math students, I cannot judge how adequate the textbook is for psychology or other social science students. It however sounds highly verbose to me, in its attempts to bypass maths formulas. For instance, the 15 pages of the chapter on standardised scores are about moving back and forth between the raw data and its standardised version

meaning

Or the two pages (pp.71-72) of motivations on the r coefficient before the (again meaningless) formula

which even skips indices of the z-scores to avoid frightening the students. (The book also asserts that a correllation of zero “corresponds to no mathematical relationship between [the] two variables whatsoever”, p.70.) Or yet the formula for (raw-score) regression (p.97) given as

where and

without defining B. Which is apparently a typo as the standardised regression used β… I could keep going with such examples but the point I want to make is that, if the authors want to reach students that have fundamental problems with a formula like

which does not appear in the book, they could expose them to the analysis and understanding of the outcome of statistical software rather than spending a large part of the book on computing elementary quantities like the coefficients of a simple linear regression by hand. Instead, fundamental notions like multivariate regression is relegated to an appendix (E) as “Advanced statistics to be aware of”. Plus a two page discussion (pp.104-105) of a study conducted by the first author on predicting preference for vaginal sex. (To put things into context, the first author also wrote *Mating Intelligence Unleashed*. Which explains for some unusual “real research examples”.)

**A**nother illustration of what I consider as the wrong focus of the book is provided by the introduction to (probability and) the normal distribution in Chapter 6, which dedicates most of the pages to reading the area under the normal density from a normal table without even providing the formula of this density. (And with an interesting typo in Figure 6.4.) Indeed, as in last century textbooks, the book does include probability tables for standard tests. Rather than relying on software and pocket calculators to move on to the probabilistic interpretation of p-values. And the multi-layered caution that is necessary when handling hypotheses labelled as *significant*. (A caution pushed to its paroxysm in *The Cult of Significance* I reviewed a while ago.) The book includes a chapter on power but, besides handling coordinate axes in a weird manner (check from Fig. 9.5 onwards) and repeating everything twice for left- and right-one-sided hypotheses!, it makes the computation of power appear like the main difficulty when it is its interpretation that is most delicate and fraught with danger. Were I to teach (classical) testing to math-adverse undergrads, and I may actually have to next year!, I would skip the technicalities and pile up cases and counter-cases explaining why p-values and power are not the end of the analysis. (Using Andrew’s blog as a good reservoir for such cases, as illustrated by his talk in Chamonix last January!) But I did not see any warning in that book on the dangers of manipulating data, formulating hypotheses to test out of the data, running multiple tests with no correction and so on.

**T**o conclude on this extensive review, I, as an outsider, fail to see redeeming features that would single *Straightforward Statistics: Understanding the Tools of Research* as a particularly enticing textbook. The authors have clearly put a lot of efforts into their book, adopted what they think is the most appropriate tone to reach to the students, and added very detailed homeworks and their solution. Still, this view makes statistics sounds too straightforward and leads to the far too common apprehension of p-values as the ultimate assessment for statistical significance, without opening for alternatives such as outliers and model misspecification.

Filed under: Books, Kids, Statistics, University life Tagged: hypothesis testing, introductory textbooks, multiple tests, Oxford University Press, p-values, power, psychology, tests

### back in Warwick

Filed under: pictures, Running, Travel, University life Tagged: England, summer, University of Warwick, Zeeman building

### recycling accept-reject rejections (#2)

**F**ollowing yesterday’s post on Rao’s, Liu’s, and Dunson’s paper on a new approach to intractable normalising constants, and taking advantage of being in Warwick, I tested the method on a toy model, namely the posterior associated with n Student’s t observations with unknown location parameter μ and a flat prior,

which is “naturally” bounded by a Cauchy density with scale √ν. The constant M is then easily derived and running the new algorithm follows from a normal random walk proposal targeting the augmented likelihood (R code below).

**A**s shown by the above graph, the completion-by-rejection scheme produces a similar outcome (tomato) as the one based on the sole observations (steelblue). With a similar acceptance rate. However, the computing time is much much degraded:

when compared with the no-completion version. Here is the entire R code that produced both MCMC samples:

#Student t observations and flat prior nu=4 n=25 M=pi*sqrt(nu) sqrtnu=sqrt(nu) obs=rt(n,df=4) sdobs=sd(obs) #unormalised t mydt=function(x,mu){ return(dt(x-mu,df=nu)/dt(0,df=nu))} mydtc=cmpfun(mydt) mydcauchy=function(x,mu){ y=(x-mu)/sqrtnu return(dcauchy(y)/sqrtnu)} mydcaucchy=cmpfun(mydcauchy) #augmented data augmen=function(mu){ y=NULL for (i in 1:n){ prop=mu+rcauchy(1)*sqrtnu reject=(runif(1)<mydtc(prop,mu)/(M*mydcaucchy(prop,mu))) while (!reject){ y=c(y,prop) prop=mu+rcauchy(1)*sqrtnu reject=(runif(1)<mydtc(prop,mu)/(M*mydcaucchy(prop,mu)))} } return(y)} #Gibbs gibbsda=function(T=10^4){ theta=rep(0,T) for (t in 2:T){ rej=augmen(theta[t-1]) theta[t]=prop=theta[t-1]+rnorm(1,sd=.1*sdobs) propdens=sum(dt(obs-prop,df=nu,log=TRUE))+ sum(log(mydcaucchy(rej,prop)-mydtc(rej,mu=prop)/M)) refdens=sum(dt(obs-theta[t-1],df=nu,log=TRUE))+ sum(log(mydcaucchy(rej,theta[t-1])-mydtc(rej,mu=theta[t-1])/M)) if (log(runif(1))>propdens-refdens) theta[t]=theta[t-1] } return(theta)} g8=cmpfun(gibbsda) gibbs2=function(T=10^4){ eta=rep(0,T) for (t in 2:T){ eta[t]=prop=eta[t-1]+rnorm(1,sd=sdobs) propdens=sum(dt(obs-prop,df=nu,log=TRUE)) refdens=sum(dt(obs-eta[t-1],df=nu,log=TRUE)) if (log(runif(1))>propdens-refdens) eta[t]=eta[t-1] } return(eta)} g9=cmpfun(gibbsda)Filed under: R, Statistics, University life Tagged: accept-reject algorithm, compiler, Data augmentation, Gibbs sampling, MCMC, Monte Carlo Statistical Methods, Student's t distribution

### recycling accept-reject rejections

**V**inayak Rao, Lizhen Lin and David Dunson just arXived a paper which proposes anew technique to handle intractable normalising constants. And which exact title is Data augmentation for models based on rejection sampling. (Paper that I read in the morning plane to B’ham, since this is one of my weeks in Warwick.) The central idea therein is that, if the sample density (*aka* likelihood) satisfies

where all terms but p are known in closed form, then completion by the rejected values of an hypothetical accept-reject algorithm−hypothetical in the sense that the data does not have to be produced by an accept-reject scheme but simply the above domination condition to hold−allows for a data augmentation scheme. Without requiring the missing normalising constant. Since the completed likelihood is

A closed-form, if not necessarily congenial, function.

**N**ow this is quite a different use of the “rejected values” from the accept reject algorithm when compared with our 1996 Biometrika paper on the Rao-Blackwellisation of accept-reject schemes (which, still, could have been mentioned there… Or Section 4.2 of Monte Carlo Statistical Methods. Rather than re-deriving the joint density of the augmented sample, “accepted+rejected”.)

**I**t is a neat idea in that it completely bypasses the approximation of the normalising constant. And avoids the somewhat delicate tuning of the auxiliary solution of Moller et al. (2006) The difficulty with this algorithm is however in finding an upper bound M on the unnormalised density f that is

- in closed form;
- with a manageable and tight enough “constant” M;
- compatible with running a posterior simulation conditional on the added rejections.

The paper seems to assume further that the bound M is independent from the current parameter value θ, at least as suggested by the notation (and Theorem 2), but this is not in the least necessary for the validation of the formal algorithm. Such a constraint would pull M higher, hence reducing the efficiency of the method. Actually the matrix Langevin distribution considered in the first example involves a bound that depends on the parameter κ.

**T**he paper includes a result (Theorem 2) on the uniform ergodicity that relies on heavy assumptions on the proposal distribution. And a rather surprising one, namely that the probability of *rejection* is bounded from below, i.e. calling for a *less* efficient proposal. Now it seems to me that a uniform ergodicity result holds as well when the probability of *acceptance* is bounded from below since, then, the event when no rejection occurs constitutes an atom from the augmented Markov chain viewpoint. There therefore occurs a renewal each time the rejected variable set ϒ is empty, and ergodicity ensues (Robert, 1995, *Statistical Science*).

**N**ote also that, despite the opposition raised by the authors, the method *per se* does constitute a pseudo-marginal technique à la Andrieu-Roberts (2009) since the independent completion by the (pseudo) rejected variables produces an unbiased estimator of the likelihood. It would thus be of interest to see how the recent evaluation tools of Andrieu and Vihola can assess the loss in efficiency induced by this estimation of the likelihood.

*Maybe some further experimental evidence tomorrow…*

Filed under: Statistics, University life Tagged: accept-reject algorithm, arXiv, auxiliary variable, Data augmentation, George Casella, intractable likelihood, Monte Carlo Statistical Methods, Rao-Blackwellisation, recycling, untractable normalizing constant

### R/Rmetrics in Paris [alas!]

**T**oday I gave a talk on Bayesian model choice in a fabulous 13th Century former monastery in the Latin Quarter of Paris… It is the Collège des Bernardins, close to Jussieu and Collège de France, unbelievably hidden to the point I was not aware of its existence despite having studied and worked in Jussieu since 1982… I mixed my earlier San Antonio survey on importance sampling approximations to Bayes factors with an entry to our most recent work on ABC with random forests. This was the first talk of the 8th R/Rmetrics workshop taking place in Paris this year. (Rmetrics is aiming at aggregating R packages with econometrics and finance applications.) And I had a full hour and a half to deliver my lecture to the workshop audience. Nice place, nice people, new faces and topics (and even andouille de Vire for lunch!): why should I complain with an alas in the title?!**W**hat happened is that the R/Rmetrics meetings have been till this year organised in Meielisalp, Switzerland. Which stands on top of Thuner See and… just next to the most famous peaks of the Bernese Alps! And that I had been invited last year but could not make it… Meaning I lost a genuine opportunity to climb one of my five dream routes, the Mittelegi ridge of the Eiger. As the future R/Rmetrics meetings will not take place there.

**A** lunch discussion at the workshop led me to experiment the compiler library in R, library that I was unaware of. The impact on the running time is obvious: recycling the fowler function from the last Le Monde puzzle,

shows a ten- to twenty-fold gain in system time, if not in elapsed time (re-alas!).

Filed under: Mountains, pictures, R, Statistics, Travel, University life Tagged: ABC, andouille de Vire, Bayesian econometrics, Bayesian model choice, Bernese Alps, cmpfun(), Collège des Bernardins, compiler, Eiger, importance sampling, Interlaken, Meielisalp, Mittelegi ridge, Paris, R, Rmetrics, San Antonio, Switzerland, system.time, Thun Lake

### …et sinon Mr Cardoso réussit là où les ont échoué!

**A** similar flier, a few days later. With very precise (if incoherent) guarantees! And a fantastic use of capitals. Too bad Monsieur Cardoso could not predict the (occurrence of a) missing noun in the last sentence…

Filed under: Kids, pictures Tagged: advertising, charlatanism, mailbox, Pantin, Paris

### Prof. Ntayiya résout tous vos problèmes!

**O**ne of the numerous fliers for “occult expertise” that end up in my mailbox… An interesting light on the major woes of my neighbours. That can induce some to consult with such charlatans. And a wonder: how can Professeur Ntayiya hope to get paid if the effects need to be proven?!

Filed under: pictures Tagged: black block, charlatanism, fliers, Fontenay-aux-Roses, mailbox

### Le Monde puzzle [#872]

**A**n “mildly interesting” Le Monde mathematical puzzle that eventually had me running R code on a cluster:

*Within the set {1,…,56}, take 12 values at random, x1,…,x12. Is it always possible to pick two pairs from those 12 balls such that their sums are equal?
*

**I**ndeed, while exhaustive search cannot reach the size of the set,

with N=35 being my upper limit (and n=9 the largest value inducing double sums), the (second) easiest verification goes by sampling as indicated and checking for duplicates.

mindup=66 for (t in 1:10^7){ #arguing that extremes should be included x=c(1,56,sample(2:55,10)) A=outer(x,x,"+") mindup=min(mindup,sum(duplicated(A[lower.tri(A)]))) if (mindup==0) break()}**T**he values of mindup obtained by running this code a few times are around 5, which means a certain likelihood of a positive answer to the above question…

**T**his problem raises a much more interesting question, namely how to force simulations of those 12-uplets towards the most extreme values of the target function, from simulated annealing to cross-entropy to you-name-it… Here is my simulated annealing attempt:

**A**part from this integer programming exercise, a few items of relevance in this Le Monde Science & Medicine leaflet. A portrait of Leslie Lamport for his Turing Prize (yes, the very same Leslie Lamport who created **LaTeX**!, and wrote this book which stood on most mathematicians’ bookshelves for decades, with the marginally annoying lion comics at the head of each chapter!). A tribune on an interesting book, *The Beginning and the End*, by Clément Vidal, discussing how to prepare for the end of the Universe by creating a collective mind. And the rise of biobanks…

Filed under: Books, Kids, Statistics, University life Tagged: LaTeX, Le Monde, Leslie Lamport, lions, mathematical puzzle, Tring Prize

### thermodynamic Monte Carlo

**M**ichael Betancourt, my colleague from Warwick, arXived a month ago a paper about a differential geometry approach to relaxation. *(In the Monte Carlo rather than the siesta sense of the term relaxation!)* He is considering the best way to link a simple base measure ϖ to a measure of interest π by the sequence

where Z(β) is the normalising constant (or *partition function* in the thermodynamic translation). Most methods are highly dependent on how the sequence of β’s is chosen. A first nice result (for me) is that the Kullback-Leibler distance and the partition function are strongly related in that

which means that the variation in the normalising constant is driving the variation in the Kullback-Leibler distance. The next section goes into differential geometry and the remains from my Master course in differential geometry alas are much too scattered for me to even remember some notions like that of a *bundle*… So, like Andrew, I have trouble making sense of the resulting algorithm, which updates the temperature β along with the position and speed. (It sounds like an extra and corresponding energy term is added to the original Hamiltonian function.) Even the Beta-Binomial

example is somewhat too involved for me. So I tried to write down the algorithm step by step in this special case. Which led to

- update β into β-εδp’²
- update p into p-εδp’
- update p’ into p’+ε{(1-a)/p+(b-1)/(1-p)}
- compute the average log-likelihood, λ* under the tempered version of the target (at temperature β)
- update p’ into p’+2εβ{(1-a)/p+(b-1)/(1-p)}-ε[λ-λ*]p’
- update p’ into p’+ε{(1-a)/p+(b-1)/(1-p)}
- update β into β-εδp’²
- update p into p-εδp’

where p’ denotes the momentum auxiliary variable associated with the kinetic energy. And λ is the current log-likelihood. (The parameter ε was equal to 0.005 and I could not find the value of δ.) The only costly step in the above list is the approximation of the log-likelihood average λ*. The above details make the algorithm quite clear but I am still missing the intuition behind…

Filed under: Books, Statistics, University life Tagged: acceleration of MCMC algorithms, differential geometry, Hamiltonian Monte Carlo, Riemann manifold, University of Warwick

### did I mean endemic? [pardon my French!]

**D**eborah Mayo wrote a Saturday night special column on our Big Bayes stories issue in *Statistical Science*. She (predictably?) focussed on the critical discussions, esp. David Hand’s most forceful arguments where he essentially considers that, due to our (special issue editors’) selection of successful stories, we biased the debate by providing a “one-sided” story. And that we or the editor of *Statistical Science* should also have included frequentist stories. To which Deborah points out that demonstrating that “only” a frequentist solution is available may be beyond the possible. And still, I could think of partial information and partial inference problems like the “paradox” raised by Jamie Robbins and Larry Wasserman in the past years. (Not the normalising constant paradox but the one about censoring.) Anyway, the goal of this special issue was to provide a range of realistic illustrations where Bayesian analysis was a most reasonable approach, not to raise the Bayesian flag against other perspectives: in an ideal world it would have been more interesting to get discussants produce alternative analyses bypassing the Bayesian modelling but obviously discussants only have a limited amount of time to dedicate to their discussion(s) and the problems were complex enough to deter any attempt in this direction.

**A**s an aside and in explanation of the cryptic title of this post, Deborah wonders at my use of *endemic* in the preface and at the possible mis-translation from the French. I did mean *endemic* (and *endémique*) in a half-joking reference to a disease one cannot completely get rid of. At least in French, the term extends beyond diseases, but presumably *pervasive* would have been less confusing… Or *ubiquitous* (as in Ubiquitous Chip for those with Glaswegian ties!). She also expresses “surprise at the choice of name for the special issue. Incidentally, the “big” refers to the bigness of the problem, not big data. Not sure about “stories”.” Maybe another occurrence of lost in translation… I had indeed no intent of connection with the “big” of “Big Data”, but wanted to convey the notion of a big as in major problem. And of a story explaining why the problem was considered and how the authors reached a satisfactory analysis. The story of the Air France Rio-Paris crash resolution is representative of that intent. (Hence the explanation for the above picture.)

Filed under: Books, Statistics, University life Tagged: Air France, Bayesian Analysis, censoring, endemic, Glasgow, guest editors, information theory, Larry Wasserman, Robins-Wasserman paradox, Statistical Science, translation, Ubiquitous Chip

### ABC model choice by random forests

**A**fter more than a year of collaboration, meetings, simulations, delays, switches, visits, more delays, more simulations, discussions, and a final marathon wrapping day last Friday, Jean-Michel Marin, Pierre Pudlo, and I at last completed our latest collaboration on ABC, with the central arguments that (a) using random forests is a good tool for choosing the most appropriate model and (b) evaluating the posterior misclassification error rather than the posterior probability of a model is an appropriate paradigm shift. The paper has been co-signed with our population genetics colleagues, Jean-Marie Cornuet and Arnaud Estoup, as they provided helpful advice on the tools and on the genetic illustrations and as they plan to include those new tools in their future analyses and DIYABC software. ABC model choice via random forests is now arXived and very soon to be submitted…

**O**ne scientific reason for this fairly long conception is that it took us several iterations to understand the intrinsic nature of the random forest tool and how it could be most naturally embedded in ABC schemes. We first imagined it as a filter from a set of summary statistics to a subset of significant statistics (hence the automated ABC advertised in some of my past or future talks!), with the additional appeal of an associated distance induced by the forest. However, we later realised that (a) further ABC steps were counterproductive once the model was selected by the random forest and (b) including more summary statistics was always beneficial to the performances of the forest and (c) the connections between (i) the true posterior probability of a model, (ii) the ABC version of this probability, (iii) the random forest version of the above, were at best very loose. The above picture is taken from the paper: it shows how the true and the ABC probabilities (do not) relate in the example of an MA(q) model… We thus had another round of discussions and experiments before deciding the unthinkable, namely to give up the attempts to approximate the posterior probability in this setting and to come up with another assessment of the uncertainty associated with the decision. This led us to propose to compute a posterior predictive error as the error assessment for ABC model choice. This is mostly a classification error but (a) it is based on the ABC posterior distribution rather than on the prior and (b) it does not require extra-computations when compared with other empirical measures such as cross-validation, while avoiding the sin of using the data twice!

Filed under: pictures, R, Statistics, Travel, University life Tagged: ABC, ABC model choice, arXiv, Asian lady beetle, CART, classification, DIYABC, machine learning, model posterior probabilities, Montpellier, posterior predictive, random forests, SNPs, using the data twice

### editor’s nightmare

Filed under: Books, Kids, pictures, University life Tagged: Fermat, Fermat's theorem, fonts, NIPS, proceedings, xkcd