## Bayesian Bloggers

### sunset from my office

Filed under: pictures, University life Tagged: bois de Boulogne, La Défense, Paris, sunset, Université Paris Dauphine

### rediscovering the harmonic mean estimator

**W**hen looking at unanswered questions on X validated, I came across a question where the author wanted to approximate a normalising constant

while simulating from the associated density, *g*. While seemingly unaware of the (huge) literature in the area, he re-derived [a version of] the harmonic mean estimate by considering the [inverted importance sampling] identity

when α is a probability density and by using for α the uniform over the whole range of the simulations from *g*. This choice of α obviously leads to an estimator with infinite variance when the support of *g* is unbounded, but the idea can be easily salvaged by using instead another uniform distribution, for instance on an highest density region, as we studied in our papers with Darren Wraith and Jean-Michel Marin. (Unfortunately, the originator of the question does not seem any longer interested in the problem.)

Filed under: Kids, Statistics, University life Tagged: cross validated, estimating a constant, harmonic mean, HPD region, importance sampling, infinite variance estimators, normalising constant, Radford Neal

### Gauss to Laplace transmutation interpreted

**F**ollowing my earlier post [induced by browsing X validated], on the strange property that the product of a Normal variate by an Exponential variate is a Laplace variate, I got contacted by Peng Ding from UC Berkeley, who showed me how to derive the result by a mere algebraic transform, related with the decomposition

(X+Y)(X-Y)=X²-Y² ~ 2XY

when X,Y are iid Normal N(0,1). Peng Ding and Joseph Blitzstein have now arXived a note detailing this derivation, along with another derivation using the moment generating function. As a coincidence, I also came across another interesting representation on X validated, namely that, when X and Y are Normal N(0,1) variates with correlation ρ,

XY ~ R(cos(πU)+ρ)

with R Exponential and U Uniform (0,1). As shown by the OP of that question, it is a direct consequence of the decomposition of (X+Y)(X-Y) and of the polar or Box-Muller representation. This does not lead to a standard distribution of course, but remains a nice representation of the product of two Normals.

Filed under: Books, Kids, Statistics, University life Tagged: Box-Muller algorithm, Carl Friedrich Gauss, convolution, correlation, cross validated, Laplace distribution, Pierre Simon de Laplace

### do cartoons help?

**I** received a (mass) email from Taylor & Francis about creating a few cartoons related to recent papers… As in the example above about the foot strike of Kilian Jornet. With a typo on Font-Romeu. Apart from the authors themselves, and maybe some close relatives!, I have trouble seeing the point of this offer, as cartoons are unlikely to attract academic readers interested in the contents of the paper.

Filed under: Books, Kids, Running, University life Tagged: academic journals, cartoon, commercial editing, Font-Romeu, Footwear Science, Taylor & Francis, trail running

### new spider

**W**hile having breakfast this morning, I spotted this fairly unusual spider by my kitchen window. I wonder which species it is!

Filed under: Kids, pictures Tagged: breakfast, Fall, Sceaux, spider

### ghost town [book review]

**D**uring my week in Warwick, I bought a book called Ghost Town, by Catriona Troth, from the campus bookstore, somewhat randomly, mostly because its back-cover was mentioning Coventry in the early 1980’s, racial riots, and anti-skinhead demonstrations, as well as the University of Warwick. And Ska, this musical style from the 1980’s, inspired from an earlier Jamaican rhythm, which emerged in Coventry with a groups called The Specials. (And the more mainstream Madness from Camden Town.) While this was some of the music I was listening to at that time, I was completely unaware it had started in Coventry! And Ghost Town is a popular song from The Specials. Which thus inspired the title of the book..

Enough with preliminaries!, the book is quite a good read, although more for the very realistic rendering of the atmosphere of the early 1980’s than for the story itself, even though both are quite intermingled. Most of the book action takes place in an homeless shelter where students just out of the University (or simply jobless) run the shelter and its flow of unemployed workers moving or drifting from the closed factories of the North towards London… This is Margaret Thatcher’s era, no doubt about this!, and the massive upheaval of industrial Britain at that time is translated into the gloomy feeling of an impoverished Midlands city like Coventry. This is also the end of the 1970’s, with (more) politically active students, almost indiscriminatingly active against every perceived oppression, from racism, to repression, the war in Ireland (with the death of Bobby Sand in Maze prison, for which I remember marching in Caen…), but mostly calling for a more open society. Given the atmosphere at that time, and especially given this was the time I was a student, there is enough material to make the book quite enjoyable [for me] to read! Even though I find the personal stories of both main protagonists somewhat caricaturesque and rather predictable. And, maybe paradoxically, the overall tone of the (plot) relationship between those two is somewhat patronising and conservative. When considering that they both can afford to retreat to safe havens when need be. But this does not make the bigger picture any less compelling a read, as the description of the (easy) manipulation of the local skinheads towards more violent racism by unnamed political forces is scary, with a very sad ending.

One side comment [of no relevance] is that reading the book made me realise I had no idea what Coventry looks like: none of the parts of town mentioned there evokes anything to me as I have never ventured farther than the train station! Which actually stands outside the ring road, hence not within the city limits. I hope I can find time during one of my next trips to have a proper look at down-town Coventry!

Filed under: Books, Kids, Travel, University life Tagged: Bobby Sand, book review, Camden Town, Catriona Troth, Coventry, Ghost Town, Madness, Margaret Thatcher, Midlands, Ska, The Specials, University of Warwick

### MCMskv, Lenzerheide, 4-7 Jan., 2016 [(breaking) news #4]

**A**s the deadline draws near, a week from now!, I want to remind participants to the next MCMSki conference in Lenzerheide that they can candidate to the Breaking News! session:

This edition of the MCMSki conference will include a Breaking News! session, covering the latest developments in the field, latest enough to be missed by the scientific committee when building the program. To be considered for this special session, please indicate you wish to compete for this distinction when submitting your poster. The deadline for submission is November 15, 2015. The selection will be made by the scientific committee and the time allocated to each talk will depend on the number of selected talks. Selected presenters will be notified by December 02, 2015, and they are expected to participate in the poster session to ensure maximal dissemination of their breaking news.

And since I got personal enquiries yesterday, the number of talks during that session will be limited to have real talks and not flash oral presentations of incoming posters. Unless the scientific committee cannot make its mind on which news to break..!

Filed under: Kids, Mountains, pictures, Travel, University life Tagged: Bayesian computation, breaking news, Chur, Graubünden, ISBA, Lenzerheide, lodging, MCMSki, MCMskv, Monte Carlo Statistical Methods, poster session, Sankt Moritz, Switzerland, Zurich

### bootstrap(ed) likelihood for ABC

**T**his recently arXived paper by Weixuan Zhu , Juan Miguel Marín, and Fabrizio Leisen proposes an alternative to our empirical likelihood ABC paper of 2013, or BCel. Besides the mostly personal appeal for me to report on a Juan Miguel Marín working [in Madrid] on ABC topics, along my friend Jean-Michel Marin!, this paper is another entry on ABC that connects with yet another statistical perspective, namely bootstrap. The proposal, called BCbl, is based on a reference paper by Davison, Hinkley and Worton (1992) which defines a *bootstrap likelihood*, a notion that relies on a double-bootstrap step to produce a non-parametric estimate of the distribution of a given estimator of the parameter θ. This estimate includes a smooth curve-fitting algorithm step, for which little description is available from the current paper. The bootstrap non-parametric substitute then plays the role of the actual likelihood, with no correction for the substitution just as in our BCel. Both approaches are convergent, with Monte Carlo simulations exhibiting similar or even identical convergence speeds although [unsurprisingly!] no deep theory is available on the comparative advantage.

An important issue from my perspective is that, while the empirical likelihood approach relies on a choice of identifying constraints that strongly impact the numerical value of the likelihood approximation, the bootstrap version starts directly from a subjectively chosen estimator of θ, which may also impact the numerical value of the likelihood approximation. In some ABC settings, finding a primary estimator of θ may be a real issue or a computational burden. Except when using a preliminary ABC step as in semi-automatic ABC. This would be an interesting crash-test for the BCbl proposal! (This would not necessarily increase the computational cost by a large amount.) In addition, I am not sure the method easily extends to larger collections of summary statistics as those used in ABC, in particular because it necessarily relies on non-parametric estimates, only operating in small enough dimensions where smooth curve-fitting algorithms can be used. Critically, the paper only processes examples with a few parameters.

The comparisons between BCel and BCbl that are produced in the paper show some gain towards BCbl. Obviously, it depends on the respective calibrations of the non-parametric methods and of regular ABC, as well as on the available computing time. I find the population genetic example somewhat puzzling: The paper refers to our composite likelihood to set the moment equations. Since this is a pseudo-likelihood, I wonder how the authors do select their parameter estimates in the double-bootstrap experiment. And for the Ising model, it is not straightforward to conceive of a bootstrap algorithm on an Ising model: (a) how does one subsample pixels and (b) what are the validity guarantees for the estimation procedure.

Filed under: pictures, Statistics Tagged: ABCel, BCbl, bootstrap, bootstrap likelihood, empirical likelihood, Ising model, Madrid, summary statistics, Universidad Carlos III de Madrid

### how individualistic should statistics be?

**K**eli Liu and Xiao-Li Meng completed a paper on the very nature of inference, to appear in The Annual Review of Statistics and Its Application. This paper or chapter is addressing a fundamental (and foundational) question on drawing inference based a sample on a new observation. That is, in making prediction. To what extent should the characteristics of the sample used for that prediction resemble those of the future observation? In his 1921 book, *A Treatise on Probability*, Keynes thought this similarity (or individualisation) should be pushed to its extreme, which led him to somewhat conclude on the impossibility of statistics and never to return to the field again. Certainly missing the incoming possibility of comparing models and selecting variables. And not building so much on the “all models are wrong” tenet. On the contrary, classical statistics use the entire data available and the associated model to run the prediction, including Bayesian statistics, although it is less clear how to distinguish between data and control there. Liu & Meng debate about the possibility of creating controls from the data alone. Or “alone” as the model behind always plays a capital role.

*“Bayes and Frequentism are two ends of the same spectrum—a spectrum defined in terms of relevance and robustness. The nominal contrast between them (…) is a red herring.”*

The paper makes for an exhilarating if definitely challenging read. With a highly witty writing style. If only because the perspective is unusual, to say the least!, and requires constant mental contortions to frame the assertions into more traditional terms. For instance, I first thought that Bayesian procedures were in agreement with the ultimate conditioning approach, since it conditions on the observables and nothing else (except for the model!). Upon reflection, I am not so convinced that there is such a difference with the frequentist approach in the (specific) sense that they both take advantage of the entire dataset. Either from the predictive or from the plug-in distribution. It all boils down to how one defines “control”.

*“Probability and randomness, so tightly yoked in our minds, are in fact distinct concepts (…) at the end of the day, probability is essentially a tool for bookkeeping, just like the abacus.”*

Some sentences from the paper made me think of ABC, even though I am not trying to bring everything back to ABC!, as drawing controls is the nature of the ABC game. ABC draws samples or control from the prior predictive and only keeps those for which the relevant aspects (or the summary statistics) agree with those of the observed data. Which opens similar questions about the validity and precision of the resulting inference, as well as the loss of information due to the projection over the summary statistics. While ABC is not mentioned in the paper, it can be used as a benchmark to walk through it.

*“In the words of Jack Kiefer, we need to distinguish those problems with `luck data’ from those with `unlucky data’.”*

I liked very much recalling discussions we had with George Casella and Costas Goutis in Cornell about frequentist conditional inference, with the memory of Jack Kiefer still lingering around. However, I am not so excited about the processing of models here since, from what I understand in the paper (!), the probabilistic model behind the statistical analysis must be used to some extent in producing the control case and thus cannot be truly assessed with a critical eye. For instance, of which use is the mean square error when the model behind is unable to produce the observed data? In particular, the variability of this mean squared error is directly driven by this model. Similarly the notion of ancillaries is completely model-dependent. In the classification diagrams opposing robustness to relevance, all methods included therein are parametric. While non-parametric types of inference could provide a reference or a calibration ruler, at the very least.

Also, by continuously and maybe a wee bit heavily referring to the doctor-and-patient analogy, the paper is somewhat confusing as to which parts are analogy and which parts are methodology and to which type of statistical problem is covered by the discussion (sometimes it feels like all problems and sometimes like medical trials).

*“The need to deliver individualized assessments of uncertainty are more pressing than ever.”*

A final question leads us to an infinite regress: if the statistician needs to turn to individualized inference, at which level of individuality should the statistician be assessed? And who is going to provide the controls then? In any case, this challenging paper is definitely worth reading by (only mature?) statisticians to ponder about the nature of the game!

Filed under: Books, pictures, Statistics Tagged: ABC, All of Statistics, ancilarity, Annual Review of Statistics and Its Application, Bayesian inference, conditioning, control, foundations, frequentist inference, minimaxity, p-values, The Bayesian Choice

### the problem of assessing statistical methods

**A** new arXival today by Abigail Arnold and Jason Loeppky that discusses how simulations studies are and should be conducted when assessing statistical methods.

*“Obviously there is no one model that will universally outperform the rest. Recognizing the “No Free Lunch” theorem, the logical question to ask is whether one model will perform best over a given class of problems. Again, we feel that the answer to this question is of course no. But we do feel that there are certain methods that will have a better chance than other methods.”*

I find the assumptions or prerequisites of the paper arguable [in the sense of **2**. *open to disagreement; not obviously correc*t]—not even mentioning the switch from models to methods in the above—in that I will not be convinced that a method outperforms another method by simply looking at a series of simulation experiments. (Which is why I find *some* machine learning papers unconvincing, when they introduce a new methodology and run it through a couple benchmarks.) This also reminds me of Samaniego’s *Comparison of the Bayesian and frequentist approaches*, which requires a secondary prior to run the comparison. (And hence is inconclusive.)

*“The papers above typically show the results as a series of side-by-side boxplots (…) for each method, with one plot for each test function and sample size. Conclusions are then drawn from looking at a handful of boxplots which often look very cluttered and usually do not provide clear evidence as to the best method(s). Alternatively, the results will be summarized in a table of average performance (…) These tables are usually overwhelming to look at and interpretations are incredibly inefficient.”*

Agreed boxplots are terrible (my friend Jean-Michel is forever arguing against them!). Tables are worse. But why don’t we question RMSE as well? This is most often a very reductive way of comparing methods. I also agree with the point that the design of the simulation studies is almost always overlooked and induces a false sense of precision, while failing to cover a wide enough range of cases. However, and once more, I question the prerequisites for comparing methods through simulations for the purpose of ranking those methods. (Which is not the perspective adopted by James and Nicolas when criticising the use of the Pima Indian dataset.)

*“The ECDF allows for quick assessments of methods over a large array of problems to get an overall view while of course not precluding comparisons on individual functions (…) We hope that readers of this paper agree with our opinions and strongly encourage everyone to rely on the ECDF, at least as a starting point, to display relevant statistical information from simulations.”*

Drawing a comparison with the benchmarking of optimisation methods, the authors suggest to rank statistical methods via the empirical cdf of their performances or accuracy *across* (benchmark) problems. Arguing that “significant benefit is gained by [this] collapsing”. I am quite sceptical [as often] of the argument, first because using a (e)cdf means the comparison is unidimensional, second because I see no reason why two cdfs should be easily comparable, third because the collapsing over several problems only operates when the errors for those different problems do not overlap.

Filed under: Books, pictures, Statistics, University life Tagged: benchmark, boxplots, empirical cdf, Monte Carlo Statistical Methods, Pima Indians, RMSE, simulation

### miXed distributions

**A** couple of questions on X validated showed the difficulty students have with mixed measures and their density. Actually, my students always react with incredulity to the likelihood of a censored normal sample or to the derivation of a Bayes factor associated with the null (and atomic) hypothesis μ=0…

I attribute this difficulty to a poor understanding of the notion of density and hence to a deficiency in the training in measure theory, since the density f of the distribution F is always relative to a reference measure dμ, i.e.

f(x) = dF/dμ(x)

(Hence Lebesgue’s moustache on the attached poster!) To handle atoms in the distribution requires introducing a dominating measure dμ with atomic components, i.e., usually a sum of the Lebesgue measure and of the counting measure on the appropriate set. Which is not so absolutely obvious: while the first question had {0,1} as atoms, the second question introduced atoms on {-θ,θ}and required a change of variable to consider a counting measure on {-1,1}. I found this second question actually of genuine interest and a great toy example for class and exams.

Filed under: Books, Kids, Statistics, University life Tagged: Brittany, course, cross validated, density, Dirac mass, Lebesgue measure, mixed distribution, moustache, Nantes, Rennes

### fog lifting by mid-afternoon

Filed under: pictures, University life Tagged: bois de Boulogne, clouds, Fall, fog, France, La Défense, Paris, Université Paris Dauphine

### projection predictive input variable selection

**J**uho Piironen and Aki Vehtari just arXived a paper on variable selection that relates to two projection papers we wrote in the 1990’s with Costas Goutis (who died near Seattle in a diving accident on July 1996) and Jérôme Dupuis… Except that they move to the functional space of Gaussian processes. The covariance function in a Gaussian process is indeed based on a distance between observations, which are themselves defined as a vector of inputs. Some of which matter and some of which do not matter in the kernel value. When rescaling the distance with “length-scales” for all variables, one could think that non-significant variates have very small scales and hence bypass the need for variable selection but this is not the case as those coefficients react poorly to non-linearities in the variates… The paper thus builds a projective structure from a reference model involving all input variables.

*“…adding some irrelevant inputs is not disastrous if the model contains a sparsifying prior structure, and therefore, one can expect to lose less by using all the inputs than by trying to differentiate between the relevant and irrelevant ones and ignoring the uncertainty related to the left-out inputs.”*

While I of course appreciate this avatar to our original idea (with some borrowing from McCulloch and Rossi, 1992), the paper reminds me of some of the discussions and doubts we had about the role of the reference or super model that “anchors” the projections, as there is no reason for that reference model to be a better one. It could be that an iterative process where the selected submodel becomes the reference for the next iteration could enjoy better performances. When I first presented this work in Cagliari, in the late 1990s, one comment was that the method had no theoretical guarantee like consistency. Which is correct if the minimum distance is not evolving (how quickly?!) with the sample size n. I also remember the difficulty Jérôme and I had in figuring out a manageable forward-backward exploration of the (huge) set of acceptable subsets of variables. Random walk exploration and RJMCMC are unlikely to solve this problem.

Filed under: Books, Statistics, University life Tagged: Bayesian model choice, Cagliari, Costas Goutis, Gaussian processes, kernel, prior projection, variable selection

### graphics for the New York City marathon

**A**s the first runners are starting the race in Staten Island, here are six graphics published in the NYT about the NYC marathon, pointed out to me by my friend Darren. The first one is a great moving histogram that I cannot reproduce here, following the four batches of runners. And the unbearably slow last runner! The second graph is an almost linear increase in the number of women running the race (which, by extrapolation, means that the NYC marathon will be an all-female race by 2068!). The third graph is a square version of a pie chart, which shows that the second largest contingent after the US runners is made of French runners (7%), way above Canadian runners (2.7%). The fifth graph shows spikes in the age repartition of the runners, at 30, 40, 50, and 60: since it is unlikely to be a reporting bias, unless id’s are not controlled when registering, which would be strange given the awards are distributed by five year block age groups, this may be due to people making a big case of changing decade by running the marathon or by runners who take advantage a new age group to aim for the podium. The latest explanation is very unlikely as it would only apply to elite runners and as it should also induce a spike at 35, 45, etc. (Incidentally, I checked the winner’s time in my category, 55-60, and last year a Frenchman won in 2:48:19, which means I would have to run at about the speed of my latest half-marathon to achieve this speed…) The last graph is also quite interesting as it follows the winning times for male and female runners against the current world record across years, showing that the route is not the most appropriate to break the record, in contrast with Berlin where several records got broken.

Filed under: Running Tagged: New York City Marathon, statistical graphics, The New York Times

### two years eight months and twenty eight days [book review]

**I** have now read through Salman Rushdie‘s version of the tales of 1001 nights (which amount to two years, eight months, and twenty-eight nights—this would make exactly two years and nine months if the last month was a month of February!, not that it particularly matters). It is a fantastic tale, with supernatural jinns playing an obviously supernatural role, a tale which plot does not matter very much as it is the (Pandora) box for more tales and deeper philosophical reflections about religion and rationality. It is not a novel and even less a science-fiction novel as I read it in some reviews.

*“It was the ungodly who had been specified as the targets but (…) this place was not at all ungodly. In point of fact it was excessively godly.”*

What I liked very much, besides the literary style and the almost overwhelming culture (or cultures) of the author—of which I certainly missed a large chunk!—, in two years, eight months, and twenty-eight nights is the mille-feuille structure of the story and the associated distanciation imposed upon the reader against a natural reader’s tendency to believe or want to believe despite all inconsistencies. An induced agnosticism of sorts most appropriate to mock the irrationality of religious believers, jinns and humans alike, in a godless universe: while jinn magic abounds in the book, there is no god or at least no acting god that we can detect. But gods and religious beliefs are exploited in the war of the jinns against the hapless humans. There are just as many levels of irony therein, which further contribute to skepticism and disbelief.

*“Many, including the present author, trace the beginnings of the so-called “death of the gods”, back to this period.”*

The book is also very much embedded in today’s world, for all its connections with medieval philosophy and the historical character Ibn Rushdn (whose name was borrowed by Rushdie’s father to become their family name) or Averroes. The War on Terror, the Afghan and Syrian rise of religious fundamentalists, the Wall Street excesses, even the shooting down of the Malaysian airline MH17 by Ukrainian rebels, all take place in the background of the so-called war of the jinns. Which makes the conclusion of the book highly pessimistic if in tune with the overall philosophical cynicism of the author: if it really takes magical forces and super-heroes to bring rationality to the world, there is little hope for our own world…

*“He passed a woman with astonishing face makeup, a zipper running down the middle of her face, `unzipped’ around her mouth to reveal bloody skinless flesh all the way down her chin.”*

A last remark is that the above description of an Halloween disguise reminded me of the disguise my friend Julien Cornebise opted for a few years ago! No surprise as this is exactly the same. Which shows that Rushdie and he share some common background in popular culture.

Filed under: Books, Travel Tagged: Alberta, Banff, Banff Centre, BIRS, book review, Canada, Midnight Children, Salman Rushdie, Two Years Eight Months and Twenty-Eight Nights

### bibTeX and homonymy

**H**ow comes BibTeX is unable to spot homonyms?! Namely, if I quote two of my 1996 papers in the same LaTeX document, they will appear as Robert (1996a) and Robert (1996b). However, if I quote two different authors (or groups of authors) with the same surname, Martin as in the above example, who both happened to write a paper in 2014, BibTeX returns Martin (2014) and Martin (2014) in the output, hence it fails to recognise they are different authors, which is just weird! At least for author-year styles. I looked on Stack Exchange TeX forum, but the solution I found did not work with the IMS and Springer styles.

Filed under: Books, University life Tagged: BibTeX, compilation, LaTeX, MCQMC2014, scientific editing

### bouncy particle sampler

** A**lexandre Bouchard-Coté, Sebastian Vollmer and Arnaud Doucet just arXived a paper with the above title, which reminded me of a proposal Kerrie Mengersen and I made at Valencia 7, in Tenerife, the [short-lived!] pinball sampler. This sampler was a particle (MCMC) sampler where we used the location of the other particles to avoid their neighbourhood, by bouncing away from them according to a delayed rejection principle, with an overall Gibbs justification since the resulting target was the product of copies of the target distribution. The difficulty in implementing the (neat!) idea was in figuring out the amount of bouncing or, in more physical terms, the energy allocated to the move.

In the current paper, inspired from an earlier paper in physics, the Markov chain (or single particle) evolves by linear moves, changing directions according to a Poisson process, with intensity and direction depending on the target distribution. A local version takes advantage of a decomposition of the target into a product of terms involving only some components of the whole parameter to be simulated. And hence allowing for moves in subspaces. An extension proposed by the authors is to bounce along the Hamiltonian isoclines. The method is demonstrably ergodic and irreducible. In practice, I wonder at the level of calibration or preliminary testing required to facilitate the exploration of the parameter space, particularly in the local version that seems to multiply items to be calibrated.

Filed under: pictures, Statistics, Travel, University life Tagged: curse of dimensionality, Gibbs sampler, Hamiltonian Monte Carlo, interacting particle systems, Monte Carlo Statistical Methods, particle, pinball sampler, Teneriffe

### adaptive and delayed MCMC for expensive likelihoods [reply from the authors]

*[Chris Sherlock, Andrew Golightly and Daniel Henderson have written a reply about my earlier comments on their arXived paper which works better as a post than as a comment:]*

Thank you for the constructive criticism of our paper. Our approach uses a simple weighted average of nearest neighbours and we agree that GPs offer a useful alternative. Both methods have pros and cons, however we first note a similarity: Kriging using a GP also leads to a weighted average of values.

The two most useful pros of the GP are that, (i) by estimating the parameters of the GP one may represent the scales of variability more accurately than a simple nearest neighbour approach with weighting according to Euclidean distance, and (ii) one obtains a distribution for the uncertainty in the Kriging estimate of the log-likelihood.

Both the papers in the blog entry (as well as other recent papers which use GPs), in one way or another take advantage of the second point. However, as acknowledged in Richard Wilkinson’s paper, estimating the parameters of a GP is computationally very costly, and this estimation must be repeated as the training data set grows. Probably for this reason and because of the difficulty in identifying p(p+1)/2 kernel range parameters, Wilkinson’s paper uses a diagonal covariance structure for the kernel. We can find no description of the structure of the covariance function that is used for each statistic in the Meeds & Welling paper but this issue is difficult to avoid.

Our initial training run is used to transform the parameters so that they are approximately orthogonal with unit variance and Euclidean distance is a sensible metric. This has two consequences: (i) the KD-tree is easier to set up and use, and (ii) the nearest neighbours in a KD-tree that is approximately balanced can be found in O(log N) operations, where N is the number of training points. Both (i) and (ii) only require Euclidean distance to be a reasonable measure, not perfect, so there is no need for the training run to have “properly converged”, just for it to represent the gross relationships in the posterior and for the transformation to be 1-1. We note a parallel between our approximate standardisation using training data, and the need to estimate a symmetric matrix of distance parameters from training data to obtain a fully representative GP kernel.

The GP approach might lead to a more accurate estimate of the posterior than a nearest neighbour approach (for a fixed number of training points), but this is necessary for the algorithms in the papers mentioned above since they sample from an approximation to the posterior. As noted in the blog post the delayed-acceptance step (which also could be added to GP-based algorithms) ensures that our algorithm samples from the true posterior so accuracy is helpful for efficiency rather than essential for validity.

We have made the kd-tree C code available and put some effort into making the interface straightforward to use. Our starting point is an existing simple MCMC algorithm; as it is already evaluating the posterior (or an unbiased approximation) then why not store this and take advantage of it within the existing algorithm? We feel that our proposal offers a relatively cheap and straightforward route for this.

Filed under: Books, Statistics Tagged: acronym, adaptive MCMC methods, delayed acceptance, k-nearest neighbours, Markov chain, MCMC, Monte Carlo Statistical Methods, pseudo-marginal MCMC, simulation

### ABC in Helsinki, Stockholm, and in between [a.k.a., ABCruise]

**A**s mentioned in a previous post, I had not pursued actively the organisation of an “ABC in…” workshop this year. I am thus very grateful to Jukka Corander, Samuel Kaski, Ritabrata Dutta, Michael Gutmann, from Helsinki, to have organised the next “ABC in…” workshop in Helsinki in the most possible exotic way, namely aboard a cruise ship going between Helsinki and Stockholm, on the Baltic Sea. Hence the appropriate ABCruise nickname. It will take place from May 16 to May 18, allowing for flying [from European cities] to Helsinki on the 16th and back from Helsinki on the 18th! While this may sound an inappropriate location for a meeting, we are constructing a complete scientific program with two days of talks [with a noon break in Stockholm], posters, and a total registration fee of 200€, including cabin and meals! (Which is clearly cheaper than having the same meeting on firm ground.) So, to all ‘Og’s readers interested in ABC topics, secure those dates in your agenda and keep posted for incoming updates on the program and the opening of registration.

Filed under: Books, Kids, Statistics, Travel, University life Tagged: ABC, ABC in Helsinki, Baltic Sea, cruise, Finland, Helsinki, Stockholm, Sweden, workshop

### Think Bayes: Bayesian Statistics Made Simple

**B**y some piece of luck, I came upon the book *Think Bayes: Bayesian Statistics Made Simple*, written by Allen B. Downey and published by Green Tea Press [which I could relate to No Starch Press, focussing on coffee!, which published *Statistics Done Wrong* that I reviewed a while ago] which usually publishes programming books with fun covers. The book is available on-line for free in pdf and html formats, and I went through it during a particularly exciting administrative meeting…

*“Most books on Bayesian statistics use mathematical notation and present ideas in terms of mathematical concepts like calculus. This book uses Python code instead of math, and discrete approximations instead of continuous **mathematics. As a result, what would be an integral in a math book becomes a summation, and most operations on probability distributions are simple loops.”*

The book is most appropriately published in this collection as most of it concentrates on Python programming, with hardly any maths formula. In some sense similar to Jim Albert’s R book. Obviously, coming from maths, and having never programmed in Python, I find the approach puzzling, But just as obviously, I am aware—both from the comments on my books and from my experience on **X** validated—that a large group (majority?) of newcomers to the Bayesian realm find the mathematical approach to the topic a major hindrance. Hence I am quite open to this editorial choice as it is bound to include more people to think Bayes, or to think they can think Bayes.

*“…in fewer than 200 pages we have made it from the basics of probability to the research frontier. I’m very happy about that.”*

The choice made of operating almost exclusively through motivating examples is rather traditional in US textbooks. See e.g. Albert’s book. While it goes against my French inclination to start from theory and concepts and end up with illustrations, I can see how it operates in a programming book. But as always I fear it makes generalisations uncertain and understanding more shaky… The examples are per force simple and far from realistic statistics issues. Hence illustrates more the use of Bayesian thinking for decision making than for data analysis. To wit, those examples are about the Monty Hall problem and other TV games, some urn, dice, and coin models, blood testing, sport predictions, subway waiting times, height variability between men and women, SAT scores, cancer causality, a Geiger counter hierarchical model inspired by Jaynes, …, the exception being the final Belly Button Biodiversity dataset in the final chapter, dealing with the (exciting) unseen species problem in an equally exciting way. This may explain why the book does not cover MCMC algorithms. And why ABC is covered through a rather artificial normal example. Which also hides some of the maths computations under the carpet.

*“The underlying idea of ABC is that two datasets are alike if they yield the same summary statistics. But in some cases, like the example in this chapter, it is not obvious which summary statistics to choose.¨*

In conclusion, this is a very original introduction to Bayesian analysis, which I welcome for the reasons above. Of course, it is *only* an introduction, which should be followed by a deeper entry into the topic, and with [more] maths. In order to handle more realistic models and datasets.

Filed under: Books, Kids, R, Statistics, University life Tagged: ABC, Bayesian Analysis, book review, cross validated, Green Tea Press, MCMC, Python, The Bayesian Choice, Think Bayes