## Bayesian News Feeds

### Bayesian indirect inference [a response]

*T**his Bayesian indirect inference paper by Chris Drovandi and Tony Pettitt was discussed on the ‘Og two weeks ago and Chris sent me the following comments.*

**…**unsurprisingly, the performances of ABC comparing true data of size n with synthetic data of size m>n are not great. However, there exists another way of reducing the variance in the synthetic data, namely by repeating simulations of samples of size n and averaging the indicators for proximity, resulting in a frequency rather than a 0-1 estimator. See e.g. Del Moral et al. (2009). In this sense, increasing the computing power reduces the variability of the ABC approximation. (And I thus fail to see the full relevance of Result 1.)

**T**aking the average of the indicators from multiple simulations will reduce the variability of the estimated ABC likelihood but because it is only still an unbiased estimate it will not alter the target and will not improve the ABC approximation (Andrieu and Roberts 2009). It will only have the effect of improving the mixing of MCMC ABC. Result 1 is used to contrast ABC II and BIL as they behave quite differently as n is increased.

*The authors make several assumptions of unicity that I somewhat find unclear. While assuming that the MLE for the auxiliary model is unique could make sense (Assumption 2), I do not understand the corresponding indexing of this estimator (of the auxiliary parameter) on the generating (model) parameter θ. It should only depend on the generated/simulated data x. The notion of a noisy mapping is just confusing to me. *

The dependence on θ is a little confusing I agree (especially in the context of ABC II methods). It starts to become more clear in the context of BIL. As n goes to infinity, the effect of the simulated data is removed and then we obtain the function φ(θ) (so we need to remember which θ simulated the data), which is referred to as the mapping or binding function in the II literature. If we somehow knew the binding function, BIL would proceed straightforwardly. But of course we don’t in practice, so we try to estimate it via simulated data (which, for computational reasons, needs to be a finite sample) from the true model based on theta. Thus we obtain a noisy estimate of the mapping. One way forward might be to fit some (non-parametric?) regression model to smooth out the noise and try to recover the true mapping (without ever taking n to infinity) and run a second BIL with this estimated mapping. I plan to investigate this in future work.

*The assumption that the auxiliary score function at the auxiliary MLE for the observed data and for a simulated dataset (Assumption 3) is unique proceeds from the same spirit. I however fail to see why it matters so much. If the auxiliary MLE is the result of a numerical optimisation algorithm, the numerical algorithm may return local modes. This only adds to the approximative effect of the ABC-I schemes. *

The optimiser failing to find the MLE (local mode) is certainly an issue shared by all BII methods, apart from ABC IS (which only requires 1 optimisation, so more effort to find the MLE can be applied here). Assuming the optimiser can obtain the MLE, I think the uniqueness assumptions makes sense. It basically says that, for a particular simulated dataset we would like a unique value for the ABC discrepancy function.

*Given that the paper does not produce convergence results for those schemes, unless the auxiliary model contains the genuine model, such theoretical assumptions do not feel that necessary. *

Actually, the ABC II methods will never converge to the true posterior (in general) due to lack of sufficiency. This is even the case if the true model is a special case of the auxiliary model! (in which case BIL can converge to the true posterior)

*The paper uses normal mixtures as an auxiliary model: the multimodality of this model should not be such an hindrance (and reordering is transparent, i.e. does not “reduce the flexibility of the auxiliary model”, and does not “increase the difficulty of implementation”, as stated p.16).*

**T**he paper concludes from a numerical study to the superiority of the Bayesian indirect inference of Gallant and McCulloch (2009). Which simply replaces the true likelihood with the maximal auxiliary model likelihood estimated from a simulated dataset. (This is somehow similar to our use of the empirical likelihood in the PNAS paper.) It is however moderated by the cautionary provision that “the auxiliary model [should] describe the data well”. As for empirical likelihood, I would suggest resorting to this Bayesian indirect inference as a benchmark, providing a quick if possibly dirty reference against which to test more elaborate ABC schemes. Or other approximations, like empirical likelihood or Wood’s synthetic likelihood.

Unfortunately the methods are not quick (apart from ABC IS when the scores are analytic), but good approximations can be obtained. The majority of Bayesian methods that deal with intractable likelihoods do not target the true posterior (there are a couple of exceptions in special cases) and thus also suffer from some dirtiness, and BII does not escape from that. But, if a reasonable auxiliary model can be found, then I would suggest that (at least one of the) BII methods will be competitive.

On reflection for BIL it is not necessary for the auxiliary model to fit the data, since the generative model being proposed may be mis-specified and also not fit the data well. BIL needs an auxiliary model that mimics well the likelihood of the generative model for values of theta in non-negligible posterior regions. For ABC II, we are simply looking for a good summarisation of the data. Therefore it would seem useful if the auxiliary model did fit the data well. Note this process is independent of the generative model being proposed. Therefore the auxiliary model would be the same regardless of the chosen generative model. Very different considerations indeed.

Inspired by a discussion with Anthony Lee, it appears that the (Bayesian version) of synthetic likelihood you mentioned is actually also a BIL method but where the auxiliary model is applied to the summary statistic likelihood rather than the full data likelihood. The synthetic likelihood is nice from a numerical/computational point of view as the MLE of the auxiliary model is analytic.

Filed under: Books, Statistics, Travel, University life Tagged: ABC, auxiliary model, finite mixtures, indirect inference, score function, synthetic likelihood, University of Warwick

### Spring, already?!

### finite mixture models [book review]

**H**ere is a review of Finite Mixture Models (2000) by Geoff McLachlan & David Peel that I wrote aeons ago (circa 1999), supposedly for JASA, which lost first the files and second the will to publish it. As I was working with my student today, I mentioned the book to her and decided to publish it here, if only because I think the book deserved a positive review, even after all those years! (Since then, Sylvia Frühwirth-Schnatter published Finite Mixture and Markov Switching Models (2004), which is closer to my perspective on the topic and that I would more naturally recommend.)

Mixture modeling, that is, the use of weighted sums of standard distributions as in

is a widespread and increasingly used technique to overcome the rigidity of standard parametric distributions such as f(y;**θ)**, while retaining a parametric nature, as exposed in the introduction of my JASA review to Böhning’s (1998) book on non-parametric mixture estimation (Robert, 2000). This review pointed out that, while there are many books available on the topic of mixture estimation, the unsurpassed reference remained the book by Titterington, Smith and Makov (1985) [hereafter TSM]. I also suggested that a new edition of TSM would be quite timely, given the methodological and computational advances that took place in the past 15 years: while it remains unclear whether or not this new edition will ever take place, the book by McLachlan and Peel gives an enjoyable and fairly exhaustive update on the topic, incorporating the most recent advances on mixtures and some related models.

Geoff McLachlan has been a major actor in the field for at least 25 years, through papers, software—the book concludes with a review of existing software—and books: McLachlan (1992), McLachlan and Basford (1988), and McLachlan and Krishnan (1997). I refer the reader to Lindsay (1989) for a review of the second book, which is a forerunner of, and has much in common with, the present book.A general introduction (Chapter 1) on mixture models includes a detailed survey of the book, which may appeal to the hurried reader, and a brief history of mixture estimation, as well as notations. A more in-depth treatment of identifiability issues can be found in TSM, as well as a catalogue of the applications of mixture modelling in various domains, which is missing here—although there are about 40 different datasets analysed.

Chapter 2 covers maximum likelihood [ML] estimation and, of course, the EM algorithm which revolutionised ML estimation for latent variable models and, in particular, mixtures. This chapter does not contain proofs or even theorem-like statements about the convergence of the ML estimators (or, more precisely, of some solutions to the ML equations); on the other hand, it gives a detailed study of the implementation of EM, of the role and choice of the starting value, and of the stochastic versions of EM (although simulated annealing techniques such as those of Celeux and Diebolt (1992) and Lavielle and Moulines (1997) are missing).

A remark that applies to the whole book is that the choice of mentioning almost every reference published on the topic—there are 44 pages of references, 40%~of which are from after 1995!—somehow gets in the way of clarity and readability: this level of exhaustivity is fine for a reference book, but it makes the reading harder at the textbook level because sentences like “Celeux and Govaert (1995) have considered the equivalence of the classification ML approach to other clustering criteria under varying assumptions on the group densities” are far too elliptic to be useful.

Chapter 3 gets more detailed about the special case of normal mixtures, which are the number-one type of mixtures used in applications, with many examples, including some where *spurious* local maxima get in the way of EM. Chapter 12 presents some variations of the EM algorithm which may speed up the algorithm in large databases: IEM, lazzy EM, sparse EM, scalable EM, and multiresolution EM.

To my eyes, Chapter 4 is what gives a strong appeal to the book as a current reference on mixtures. Indeed, it covers Bayesian inference for mixtures and obviously concentrates on MCMC techniques. These developments on MCMC methodology did occur in the late 80s with Tanner and Wong (1987) and Gelfand and Smith (1990), that is, after TSM conception: the Bayesian computation techniques available in pre-Gibbs days were, compared with their successors, often crude and not always reliable. (It is surprising that, given the impetus brought by EM to ML mixture estimation and the fact that EM is one half of a Gibbs sampler, especially in its stochastic versions, the idea of completing the other half by simulation did not impose itself earlier. But such insights are always much easier a posteriori!) As in earlier chapters, the focus of Chapter 4 is on implementation: some background on MCMC methods is thus assumed, because the half-page of presentation in §4.4.1 is not enough. The authors only present Gibbs sampling algorithms for conjugate priors, although Metropolis–Hastings alternatives can also be used (see, e.g., Celeux, Robert and Hurn, 2000). They also mention results on perfect sampling, improper priors, label switching and the estimation of the number of components by reversible jump techniques, but the interested reader needs to invest in the references provided in the text, as this chapter is not self-contained enough to allow for implementation.

The following chapter covers non-normal mixtures, focusing on the important case of mixtures of generalized linear models [GLM], also called mixtures-of-experts and switching regressions in different literatures. The focus in on ML estimation and the steps of the EM algorithm are provided (see Hurn, Justel and Robert, 1999, and Viele and Tong, 2000, for Bayesian solutions). Similarly, Chapter 7 covers the case of multivariate ** t** distributions as robust alternatives to normal mixtures, with EM and ECEM estimation of the

**parameters (including the degrees of freedom). Chapters 8 to 11 deal with other specific cases such as factor analysers (Chapter 8), which generalize principal component analysis, and is estimated via the AECM alternative of Meng and van Dyk (1997); binned data (Chapter 9); failure time data (Chapter 10); and directional data (Chapter 11), based on the Kent distribution.**

*t*Chapter 6 specializes on the very current and still open problem of assessing the number k of components in a mixture. Due to the weak identifiability of mixtures and to the complex geometry of the parameter space when considering several values of k at once, standard testing tools such as the likelihood ratio test [LRT] do no work as usual. The book recalls the recent works on the distribution of the LRT under the null hypothesis, both theoretical and simulation-based. The authors detail the use of bootstrapped LRTs, with words of caution, and also present Bayesian criteria such as the BIC and Laplace-based methods. This chapter is, unsurprisingly, inconclusive, because of the weak identifiability mentioned above: for arbitrary large datasets, it is impossible to distinguish between

because of the contiguity between both representations. Unless some separating constraint is imposed, either in the Ghosh and Sen (1985) format or through a penalisation factor, it seems to me the testing problem about the number of components is fundamentally meaningless. (The Bayesian solution of the estimation of k is much more satisfactory in that it incorporates the above penalisation in the prior distribution.)

Chapter 13 deals with one of the many possible extensions of a mixture model, namely the setup of hidden Markov models. Such models are of interest in many areas; besides, they were one of the first models to use the EM algorithm (Baum and Petrie, 1966). The setting is in addition very contemporary: applications in signal processing, finance or genetics abound, while theoretical developments on the limiting properties of the ML estimate have been found only recently (Bickel, Rydén and Ritov, 1988; Douc and Mathias, 2000; Douc, Moulines and Rydén, 2001). Given the scope of this field, which would call for a volume of its own (such as the future Andrieu and Doucet, 2001) the chapter only alludes to some possible areas, and misses others like stochastic volatility models (Kim, Shephard and Chib, 1998). But it is nonetheless a nice entry to this new domain.

To conclude, I hope it is clear I consider this book as a good monograph on the current trends in mixture estimation; from using it as support in graduate school, I can also add that it is only appropriate as a textbook for advanced audience since there are no exercises and readers are forced to get involved in the literature to get a clear picture of the finer details. Nonetheless, it is a welcome addition to the field that most people working on mixture analysis should consider buying.

Filed under: Books, Kids, Statistics, University life Tagged: Bayesian inference, David Peel, EM algorithm, finite mixtures, Geoff McLachlan, hidden Markov models, JASA, Markov switching models, MCMC, mixture estimation, Monte Carlo Statistical Methods, SAEM

### Le Monde puzzle [#852]

**A** number theory Le Monde mathematical puzzle:

*Integers n of type A are such that the set {1,…,3n} can be written as the union of n sets of three integers of the form {a,b,a+b}. Integers n of type B **are such that the set {1,…,3n} can be written as the union of n sets of three integers of the form {a,b,c} with a+b+c constant. What are the integers of type B? The smallest integer of both type A and type B is 1. What are the next two integers? Is it true that for n of type A, 4n and 4n+1 are of type A?*

**A**gain a case when writing a light R code proves out of (my) reach. When n grows, a brute-force search of all partitions quickly gets impossible. So no R solution provided here! (Feel free to suggest one.)

**T**he Feb. 12, 2014, edition of the *Sciences&Medicine* leaflet is not that exciting either: mostly about medical topics. It however confirmed my solution for the #853 puzzle (44 and 42, esp.), with a tribune of Marco Zito who read the Edge 2014 annual question on ** What scientific idea is ready for retirement**: he was [surprisingly] disappointed that those entries of a few thousand signs lacked validation and were mere speculations… (One of the entries was written by Nassim Taleb on standard deviation, which should be replaced with MAD, which Taleb mistakenly called

*mean absolute deviation*, instead of

*median absolute deviation*. Gerd Gireneizer has another (short) entry against the blind use of p-values, poorly titled as “

*Scientific Inference Via Statistical Rituals*” as it sounds directed against the whole of statistics.) And then a short but interesting article on a large number of French universities cancelling their subscriptions to major journals like Science or Physical Review Letters. As they cannot follow the insane inflation in the prices of journals, thanks to the unrestricted greed of commercial editors (sounds like the Elsevier boycott did not have such an impact on the profession). The article contains this absurd quote from a Science editor who advances that all articles of the journal are freely available one year after their publication. One year is certainly better than two, ten or an infinity of years. Nonetheless, edge research is about what is published now rather than last year. A final mention for a tribune (in the

*Economics*leaflet) complaining about the impact of the arrival of continental “scientific” business students and mathematicians on the City, which deprived it from bright students from humanities and led to the financial crisis of 2008… Apparently, some are still looking for scapegoats!

Filed under: Books, Kids, Statistics Tagged: Le Monde, Lewis Carroll, mathematical puzzle, Nassim Taleb, p-values, The Edge

### Cross de Sceaux [7.2k, 4⁰C, 28:14, 42nd & 6th V2]

**T**his weekend, I ran another race *(yes, yet another running post!)* on my other “home turf” (since Malakoff is also my training ground!), Le Parc de Sceaux. This was the 30th Cross de la Ville de Sceaux (lagging one year behind Malakoff!) and there were many more runners on the starting line than last week (500 vs. 127) and some of them clearly good. (For some unfathomable reason, there are women-only (3.1k) and men-only (7.2k) races in this event.) Thanks to a strong wind, it was deadly cold if bright and sunny before the start (after it did not matter). I managed to stay with a V2 runner for most of the race, except at the very end when he pushed harder and gained a dozen meters. (It did not matter so much as we ranked 5th and 6th, almost two minutes more than the first V2…) This was not much of a cross-country race in that there was hardly any mud on the track and moderate slopes, just a few narrow passages through which runners had to squeeze on the first lap, not so much on the second.) My time is worse than last week, meaning I miss longer distance training (which are not compensated by longer bike rides!). But this was enjoyable nonetheless!

Filed under: Kids, Running Tagged: cross-country race, Parc de Sceaux, Sceaux, trail running

### Le Monde puzzle [#853]

**Y**et another one of those Le Monde mathematical puzzles which wording is confusing (or at least annoying) to me:

*A political party has 11 commissions, to which belong some of the 13 members of the central committee. A token is given to each member for each commission whom he or she belongs. Two different members cannot share more than one common commission. How many tokens at most? Same question if the president belongs to five commissions.
*

**I** just dislike the “story” around the combinatoric problem. Given 13 sets and 11 letters, how many letters can one allocate to each set so that each pair of sets share at most one letter? While waiting for my prosthesis this afternoon, I thought of a purely random search, using the following R code:

which led to the value of 43 after that many iterations. Longer runs on faster machines at Paris-Dauphine did not change the output but, as usual with brute force simulation, the true solution may be such an extreme that it is extremely unlikely to happen… I then tried a standard simulated annealing exploration, but could not find an higher value. Except once, leading to the value 44. Here is the corresponding allocation of letters (*committees*) to sets (*individuals*) for that solution.**I**n this Feb. 5, 2014, issue of Le Monde Science&Médecine leaflet, a review of (my Warwick colleague) Ian Stewart’s 17 equations that changed the World, which must have been recently translated in French (with no criticism of the less compelling chapter on Black-Scholes, and a confusion of the “bell curve” with statistics). Yet another tribune of Marco Zito about the generalisation of Richard Feynman’s diagrams to a solid called the Amplituhedron. (Not making much sense as exposed!)

Filed under: Books, Kids, Statistics Tagged: combinatorics, Le Monde, mathematical puzzle, simulated annealing

### ABC in Sydney, July 3-4, 2014!!!

**A**fter ABC in Paris in 2009, ABC in London in 2011, and ABC in Roma last year, things are accelerating since there will be—as I just learned— an ABC in Sydney next July ** (not June as I originally typed, thanks Robin!)**. The workshop on the current developments of ABC methodology thus leaves Europe to go down-under and to take advantage of the IMS Meeting in Sydney on July 7-10, 2014. Hopefully, “ABC in…” will continue its tour of European capitals in 2015! To keep up with an unbroken sequence of free workshops, Scott Sisson has managed to find support so that attendance is free of charge

*(free as in “no registration fee at all”!)*but

**. While I would love to visit UNSW and Sydney once again and attend the workshop, I will not, getting ready for Cancún and our ABC short course there.**

*you do need to register as space is limited*Filed under: pictures, Statistics, Travel, University life, Wines Tagged: ABC, ABC in London, ABC in Paris, ABC in Rome, abc-in-sydney, Australia, Cancún, IMS, ISBA, simulation, Sydney, Sydney Harbour, Sydney Opera, UNSW, workshop

### insufficient statistics for ABC model choice

**J**ulien Stoehr, Pierre Pudlo, and Lionel Cucala (I3M, Montpellier) arXived yesterday a paper entitled “Geometric summary statistics for ABC model choice between hidden Gibbs random fields“. Julien had presented this work at the MCMski 4 poster session. The move to a *hidden* Markov random field means that our original approach with Aude Grelaud does not apply: there is no dimension-reduction sufficient statistics in that case… The authors introduce a small collection of (four!) focussed statistics to discriminate between Potts models. They further define a novel misclassification rate, conditional on the observed value and derived from the ABC reference table. It is the predictive error rate

integrating in both the model index m and the corresponding random variable Y (and the hidden intermediary parameter) given the observation. Or rather the transform of the observation by the summary statistic S. In a simulation experiment, the paper shows that the predictive error rate decreases quite a lot by including 2 or 4 geometric summary statistics on top of the no-longer-sufficient concordance statistics. (I did not find how the distance is constructed and how it adapts to a larger number of summary statistics.)

*“*[the ABC posterior probability of index m]* uses the data twice: a first one to calibrate the set of summary statistics, and a second one to compute the ABC posterior.*” (p.8)

**I**t took me a while to understand the above quote. If we consider ABC model choice as we did in our original paper, it only and correctly uses the data once. However, if we select the vector of summary statistics based on an empirical performance indicator resulting from the data then indeed the procedure does use the data twice! Is there a generic way or trick to compensate for that, apart from cross-validation?

Filed under: Books, Kids, Statistics, University life Tagged: ABC, arXiv, Gibbs random field, Markov random field, Monte Carlo Statistical Methods, predictive loss, simulation

### fast-forward bug

### a refutation of Johnson’s PNAS paper

**J**ean-Christophe Mourrat recently arXived a paper “P-value tests and publication bias as causes for high rate of non-reproducible scientific results?”, intended as a rebuttal of Val Johnson’s PNAS paper. The arguments therein are not particularly compelling. (Just as ours’ may sound so to the author.)

*“We do not discuss the validity of this* [Bayesian]* hypothesis here, but we explain in the supplementary material that if taken seriously, it leads to incoherent results, and should thus be avoided for practical purposes.”*

**T**he refutation is primarily argued as a rejection of the whole Bayesian perspective. (Although we argue Johnson’ perspective is not that Bayesian…) But the argument within the paper is much simpler: if the probability of rejection under the null is at most 5%, then the overall proportion of false positives is also at most 5% and not 20% as argued in Johnson…! Just as simple as this. Unfortunately, the author mixes conditional and unconditional, frequentist and Bayesian probability models. As well as conditioning upon the data and conditioning upon the rejection region… Read at your own risk.

*“These examples vividly illustrate that the choice of the a priori Bayesian hypothesis is not innocent. It needs to be carefully substantiated by evidence, instead of drawn from some blind (albeit “objective”) automatic procedure.”*

**T**he arguments in the supplementary material use the characters of Alice and Bob, which should be familiar to computer scientists and xkcd fans… More seriously, the author considers the asymptotics of false positives when the alternative prior is concentrated on a single value (the setting where Johnson defines his uniformly most powerful Bayesian test). Unsurprisingly, he recovers the original figure that about 20% of the rejected cases close to the standard 5% boundary are from the null (which is also the original figure from Berger and Sellke, 1985). The remainder of the section goes on criticising the Bayesian approach, but understood as Johnson’s non-standard representation! And going full circle as to why the frequentist approach to testing is the correct one.

Filed under: Books, Statistics, University life Tagged: Alice and Bob, Bayes factor, Bayesian statistics, Bayesian tests, hypothesis testing, p-value, Valen Johnson, xkcd

### Cancun, register before Feb. 15!

**E**arly bird registration for the ISBA 2014 World Meeting in Cancún, México, ends up on Feb. 14. Since the fees are already astronomical ($410 for ISBA students and $510 for other ISBA members), it is definitely worth to register now. (As debated in an earlier post, I am opposed to the steep inflation in conference fees and do not buy the argument of the all-included package. This is certainly most convenient for the conference organisers or treasurers, but I do not want to pay for meals I will not enjoy [and in some cases not even take]. Especially when the cost is multiplied by 5 since four of my PhD students will travel to Cancún as well. Whoever is building a proposal for ISBA 2016, please bear that in mind! As we will when making our proposal for ISBA 2018 in Edinburgh.)

Filed under: Statistics, Travel, University life Tagged: Cancún, ISBA 2014, ISBA 2016, Mexico, registration fees, University of Edinburgh

### MCKSki 5, where willst thou be?!

*[Here is a call from the BayesComp Board for proposals for MCMSki 5, renamed as below to fit the BayesComp section. The earlier poll on the ‘Og helped shape the proposal, with the year, 2016 vs. 2017, remaining open. I just added town to resort below as it did not sound from the poll people were terribly interested in resorts.]*

The Bayesian Computation Section of ISBA is soliciting proposals to host its flagship conference:

Bayesian Computing at MCMSkiThe expectation is that the meeting will be held in January 2016, but the committee will consider proposals for other times through January 2017.

This meeting will be the next incarnation of the popular MCMSki series that addresses recent advances in the theory and application of Bayesian computational methods such as MCMC, all in the context of a world-class ski resort/town. While past meetings have taken place in the Alps and the Rocky Mountains, we encourage applications from any venue that could support MCMSki. A three-day meeting is planned, perhaps with an additional day or two of satellite meetings and/or short courses.

One page proposals should address feasibility of hosting the meeting including

1. Proposed dates.

2. Transportation for international participants (both the proximity of international airports and transportation to/from the venue).

3. The conference facilities.

4. The availability and cost of hotels, including low cost options.

5. The proposed local organizing committee and their collective experience organizing international meetings.

6. Expected or promised contributions from the host organization, host country, or industrial partners towards the cost of running the meetings.

Proposals should be submitted to David van Dyk (dvandyk, BayesComp Program Chair) at imperial.ac.uk no later than May 31, 2014.

The Board of Bayesian Computing Section will evaluate the proposals, choose a venue, and appoint the Program Committee for *Bayesian Computing at MCMSki*.

Filed under: Mountains, Statistics, Travel, University life Tagged: Alps, BayesComp, Bayesian computation, Chamonix-Mont-Blanc, ISBA, MCMC algorithms, MCMSki, Rocky Mountains, simulation, Utah

### when we were orphans

**K**azuo Ishiguro’s *The Remains of the Day* is one of my favourite novels for its bittersweet depiction of the growing realisation that the main character has wasted his life. This other novel has the same thread of backward perspectives and of missed opportunities, however the main character (Banks) is of a very different nature. The way *When we were orphans* is written, one starts thinking this is all about an English detective trying to uncover the truth behind a very personal tragedy, the disappearance of both his parents in Shanghai when he was a child. But progressively the narrative gets fractured and incoherent and we progressively doubt the author’s story, then his sanity. By the end of the book, it is just impossible to sift reality from imagination, daydreaming from life accomplishments. For instance, Banks presents himself as a detective with a certain degree of fame in London circles. However, there is no description whatsoever of his methods or of specific cases. The closest to a description is a child murder (and worse?) where a local constable pleads for the detective to hit at the heart of evil, in a completely incoherent discourse. The storytelling qualities of Ishiguro are so perfect that the character remains a mystery till the end. It is not even sure that he has at all left the acting as a detective he used to indulge in with his Japanese neighbour in Shanghai! The most disturbing section occurs when he revisits Shanghai at the time of the Japanese invasion and thinks he can link his parents’ disappearance with the said invasion and solve both of them at once. It is only when he enters a battle zone in the slums of the city that reality seems to reassert itself, but even then the reunification of Banks and the Japanese friend from his childhood is so unrealistic that the most likely interpretation is that Banks is in a permanent denial and that the Japanese officer he rescued plays the game to stay alive. Still, the story is told in such a way that one can never be sure of any of these interpretations and this is what makes it such a great book, more complex than *The Remains of the Day* in its construction, if less compelling because of the unfocussed nature of most characters, which we can never grasp hard enough…

Filed under: Books, Kids, Travel Tagged: Japanese invasion of China, Kazuo Ishiguro, London, Shanghai, The Remains of the Day

### art brut

### Statistics and Computing special MCMSk’issue [call for papers]

**F**ollowing the exciting and innovative talks, posters and discussions at MCMski IV, the editor of *Statistics and Computing*, Mark Girolami (who also happens to be the new president-elect of the BayesComp section of ISBA, which is taking over the management of future MCMski meetings), kindly proposed to publish a special issue of the journal open to all participants to the meeting. Not only to speakers, mind, but to all participants.

**S**o if you are interested in submitting a paper to this special issue of a computational statistics journal that is very close to our MCMski themes, I encourage you to do so. (Especially if you missed the COLT 2014 deadline!) The deadline for submissions is set on * March 15* (a wee bit tight but we would dearly like to publish the issue in 2014, namely the same year as the meeting.) Submissions are to be made through the Statistics and Computing portal, with a mention that they are intended for the special issue.

**A**n editorial committee chaired by Antonietta Mira and composed of Christophe Andrieu, Brad Carlin, Nicolas Chopin, Jukka Corander, Colin Fox, Nial Friel, Chris Holmes, Gareth Jones, Peter Müller, Antonietta Mira, Geoff Nicholls, Gareth Roberts, Håvård Rue, Robin Ryder, and myself, will examine the submissions and get back within a few weeks to the authors. In a spirit similar to the JRSS Read Paper procedure, submissions will first be examined collectively, before being sent to referees. We plan to publish the reviews as well, in order to include a global set of comments on the accepted papers. We intend to do it in The Economist style, i.e. as a set of edited anonymous comments. Usual instructions for Statistics and Computing apply, with the additional requirements that the paper should be around 10 pages and include at least one author who took part in MCMski IV.

Filed under: Books, Mountains, R, Statistics, University life Tagged: Chamonix-Mont-Blanc, deadline, guest editors, JRSSB, MCMSki IV, Read paper, Series B, ski resorts, special issue, Statistics and Computing, The Economist, University of Warwick

### Le Monde puzzle [#851]

**A** more unusual Le Monde mathematical puzzle:

*Fifty black and white tokens are set on an equilateral triangle of side 9, black on top and white on bottom. If they can only be turned three by three, determine whether it is possible to produce a triangle with all white sides on top, under each of the following constraints:*

*the three tokens must stand on a line;**the three tokens must stand on a line and be contiguous;**the three tokens must stand on the summits of an equilateral triangle;**the three tokens**must stand on the summits of an equilateral triangle of side one.*

**I** could not think of a quick fix with an R code so leave it to the interested ‘Og reader… In the next issue of the Science&Médecine leaflet (Jan. 29), which appeared while I was in Warwick, there were a few entries of interest. First, the central article was about Big Data (again), but, for a change, the journalist took the pain to include French statisticians and machine learners in the picture, like Stefan Clemençon, Aurélien Garivier, Jean-Michel Loubes, and Nicolas Vayatis. (In a typical French approach, the subtitle was “A challenge for maths”, rather than statistics!) Ignoring the (minor) confusion therein of “small n, large p” with the plague of dimensionality, the article does mention a few important issues like distributed computing, inhomogeneous datasets, overfitting and learning. There are also links to the new masters in data sciences at ENSAE, Telecom-Paritech, and Paris 6-Pierre et Marie Curie. (The one in Paris-Dauphine is still under construction and will not open next year.) As a side column, the journal also wonders about the “end of Science” due to massive data influx and “Big Data” techniques that could predict and explain without requiring theories and deductive or scientific thinking. Somewhat paradoxically, the column ends up by a quote of Jean-Michel Loubes, who states that one could think “our” methods start from effects to end up with causes, but that in fact the models are highly dependent on the data. And on the opinion of experts. Doesn’t that suggest some Bayesian principles at work there?!

**A**nother column is dedicated to Edward Teller‘s “dream” of using nuclear bombs for civil engineering, like in the Chariot project in Alaska. And the last entry is against Kelvin’s “to measure is to know”, with the title “To known is not to measure”, although it does not aim at a general philosophical level but rather objects to the unrestricted intrusion of bibliometrics and other indices brought from marketing. Written by a mathematician, this column is not directed against statistics and the Big Data revolution, but rather the myth that everything can be measured and quantified. (There was also a pointer to a tribune against the pseudo-recruiting of top researchers by Saudi universities in order to improve their Shanghai ranking but I do not have time to discuss it here. And now. Maybe later.)

Filed under: Books, Kids, Statistics, University life Tagged: bibliometrics, big data, dimension curse, Edward Teller, Le Monde, machine learning, mathematical puzzle, nuclear radiation

### i-like Oxford [workshop, March 20-21, 2014]

**T**here will be another i-like workshop this Spring, over two days in Oxford, St Anne’s College, involving talks by Xiao-Li Meng and Eric Moulines, as well as by researchers from the participating universities. Registration is now open. (I will take part as a part-time participant, travelling from Nottingham where I give a seminar on the 20th.)

Filed under: Statistics, Travel, University life Tagged: i-like, intractable likelihood, University of Nottingham, University of Oxford, University of Warwick, workshop

### 40ièmes Foulées de Malakoff [5k, 7⁰C, 18:36, 13th & 1st V2]

*(Warning: post of limited interest to anyone there, as I am posting about a local race I ran!)*

**O**nce more, I managed to run my annual 5k in Malakoff, having recovered from my Chamonix flu earlier and better than last year. And being (barely) around on the day of the race. I actually succeeded in achieving my best time over several years (3:41-3:47-3:41-3:43-3:43, for a total time of 18:36.) I also finished first in my V2 category, a feat I was far from expecting. The light training last week in Warwick eventually did help for such a short distance at a faster pace! And my INSEE Paris Club team won the company challenge for yet another year. Repeating last year setting, I was furthermore running the race with my daughter, who ended third ex-aequo with a friend.

**H**ere is a picture of the race start, already in the Lenin stadium *(Malakoff may be the last town in France to enjoy a Lenin stadium!)*, but before WWII…

Filed under: Kids, Running Tagged: 5K, groundhog day, Insee Paris Club, Malakoff, veteran (V2)

### Séminaire Probabilités, Décision, Incertitude

**L**ast Friday, I gave a seminar at the *Séminaire Probabilités, Décision, Incertitude*, which is run by IHφST, the institute for history and philosophy of sciences and techniques of the Université of Paris 1. I decided to present my Budapest EMS 2013 talk at a slower pace and by cutting the technical parts. And adding a few historical titbits. It took me two hours and I enjoyed the experience. I cannot tell for the audience, who seemed a bit wary of mathematical disgressions, but I got comments on the Lindley paradox and on the contents of Ari Spanos’ Who’s afraid… Here are the slides again, in case Slideshare freezes your browser as it does mine…

**A**s a side anecdote, the seminar took place in an old building in the core of the Saint-Germain des Prés district. The view from the seminar room on the busy streets of this district was quite eye-catching! (Not as distracting as the one from a room in Ca’ Foscari where I gave a seminar a few years ago facing the Venezia Laguna and windsurfers practising…)

Filed under: Books, Running, Statistics, Travel, University life Tagged: Bayes 250, Budapest, Ca' Foscari University, EMS 2013, IHPST, Paris, philosophy of sciences, Saint-Germain-des-Prés, Venezia

### posterior predictive p-values

* Bayesian Data Analysis* advocates in Chapter 6 using posterior predictive checks as a way of evaluating the fit of a potential model to the observed data. There is a no-nonsense feeling to it:

*“If the model fits, then replicated data generated under the model should look similar to observed data. To put it another way, the observed data should look plausible under the posterior predictive distribution.”*

**A**nd it aims at providing an answer to the frustrating *(frustrating to me, at least)* issue of Bayesian goodness-of-fit tests. There are however issues with the implementation, from deciding on which aspect of the data or of the model is to be examined, to the “use of the data twice” sin. Obviously, this is an exploratory tool with little decisional backup and it should be understood as a qualitative rather than quantitative assessment. As mentioned in my tutorial on Sunday (I wrote this post in Duke during O’Bayes 2013), it reminded me of Ratmann et al.’s ABCμ in that they both give reference distributions against which to calibrate the observed data. Most likely with a multidimensional representation. And the “use of the data twice” can be argued for or against, once a data-dependent loss function is built.

*“One might worry about interpreting the significance levels of multiple tests or of tests chosen by inspection of the data (…) We do not make *[a multiple test]* adjustment, because we use predictive checks to see how particular aspects of the data would be expected to appear in replications. If we examine several test variables, we would not be surprised for some of them not to be fitted by the model-but if we are planning to apply the model, we might be interested in those aspects of the data that do not appear typical.”*

**T**he natural objection that having a multivariate measure of discrepancy runs into multiple testing is answered within the book with the reply that the idea is not to run formal tests. I still wonder how one should behave when faced with a vector of posterior predictive p-values (ppp).

**T**he above picture is based on a normal mean/normal prior experiment I ran where the ratio prior-to-sampling variance increases from 100 to 10⁴. The ppp is based on the Bayes factor against a zero mean as a discrepancy. It thus grows away from zero very quickly and then levels up around 0.5, reaching only values close to 1 for very large values of x (i.e. never in practice). I find the graph interesting because if instead of the Bayes factor I use the marginal (numerator of the Bayes factor) then the picture is the exact opposite. Which, I presume, does not make a difference for * Bayesian Data Analysis*, since both extremes are considered as equally toxic… Still, still, still, we are is the same quandary as when using any kind of p-value: what is extreme? what is significant? Do we have again to select the dreaded 0.05?! To see how things are going, I then simulated the behaviour of the ppp under the “true” model for the pair (θ,x). And ended up with the histograms below:

which shows that under the true model the ppp does concentrate around .5 (surprisingly the range of ppp’s hardly exceeds .5 and I have no explanation for this). While the corresponding ppp does not necessarily pick any wrong model, discrepancies may be spotted by getting away from 0.5…

*“The p-value is to the u-value as the posterior interval is to the confidence interval. Just as posterior intervals are not, in general, classical confidence intervals, Bayesian p-values are not generally u-values.”*

**N**ow, * Bayesian Data Analysis* also has this warning about ppp’s being not uniform under the true model (

*u*-values), which is just as well considering the above example, but I cannot help wondering if the authors had intended a sort of subliminal message that they were not that far from uniform. And this brings back to the forefront the difficult interpretation of the numerical value of a ppp. That is, of its calibration. For evaluation of the fit of a model. Or for decision-making…

Filed under: Books, Statistics, Travel, University life Tagged: ABC, Bayesian data analysis, calibration, Duke University, exploratory data analysis, goodness of fit, model checking, O-Bayes 2013, p-values, posterior predictive