Bayesian News Feeds

Prof. Ntayiya résout tous vos problèmes!

Xian's Og - Sat, 2014-06-28 18:14

One of the numerous fliers for “occult expertise” that end up in my mailbox… An interesting light on the major woes of my neighbours. That can induce some to consult with such charlatans. And a wonder: how can Professeur Ntayiya hope to get paid if the effects need to be proven?!

Filed under: pictures Tagged: black block, charlatanism, fliers, Fontenay-aux-Roses, mailbox
Categories: Bayesian Bloggers

Le Monde puzzle [#872]

Xian's Og - Fri, 2014-06-27 18:14

An “mildly interesting” Le Monde mathematical puzzle that eventually had me running R code on a cluster:

Within the set {1,…,56}, take 12 values at random, x1,…,x12. Is it always possible to pick two pairs from those 12 balls such that their sums are equal?

Indeed, while exhaustive search cannot reach the size of the set,

fowler=function(i=1,pred=NULL){ pred=c(pred,i) for (j in (1:N)[-pred]){ a=outer(c(pred,j),c(pred,j),"+") if ((sum(duplicated(a[lower.tri(a)]))>0)){ val=FALSE }else{ if (length(pred)==n-1){ print(c(pred,j)) val=TRUE }else{ val=fowler(j,pred)}} if (val) break() } return(val) } fowler(i=N,pred=1)

with N=35 being my upper limit (and n=9 the largest value inducing double sums), the (second) easiest verification goes by sampling as indicated and checking for duplicates.

mindup=66 for (t in 1:10^7){ #arguing that extremes should be included x=c(1,56,sample(2:55,10)) A=outer(x,x,"+") mindup=min(mindup,sum(duplicated(A[lower.tri(A)]))) if (mindup==0) break()}

The values of mindup obtained by running this code a few times are around 5, which means a certain likelihood of a positive answer to the above question…

This problem raises a much more interesting question, namely how to force simulations of those 12-uplets towards the most extreme values of the target function, from simulated annealing to cross-entropy to you-name-it… Here is my simulated annealing attempt:

target=function(x){ a=outer(x,x,"+") return(sum(duplicated(a[lower.tri(a)])))} beta=100 Nmo=N-1 nmt=n-2 nmo=n-1 x=sort(sample(2:Nmo,nmt)) cur=c(1,x,N) tarcur=target(cur) for (t in 1:10^6){ dex=sample(2:nmo,2) prop=sort(c(cur[-dex],sample((2:Nmo)[-(cur-1)],2))) tarprop=target(prop) if (beta*log(runif(1))<tarprop -tarcur){ cur=prop;tarcur=tarprop} beta=beta*.9999 if (tarcur==0) break()}

Apart from this integer programming exercise, a few items of relevance in this Le Monde Science & Medicine leaflet.  A portrait of Leslie Lamport for his Turing Prize (yes, the very same Leslie Lamport who created LaTeX!, and wrote this book which stood on most mathematicians’ bookshelves for decades, with the marginally annoying lion comics at the head of each chapter!). A tribune on an interesting book, The Beginning and the End, by Clément Vidal, discussing how to prepare for the end of the Universe by creating a collective mind. And the rise of biobanks…

Filed under: Books, Kids, Statistics, University life Tagged: LaTeX, Le Monde, Leslie Lamport, lions, mathematical puzzle, Tring Prize
Categories: Bayesian Bloggers

thermodynamic Monte Carlo

Xian's Og - Thu, 2014-06-26 18:14

Michael Betancourt, my colleague from Warwick, arXived a month ago a paper about a differential geometry approach to relaxation. (In the Monte Carlo rather than the siesta sense of the term relaxation!) He is considering the best way to link a simple base measure ϖ to a measure of interest π by the sequence

where Z(β) is the normalising constant (or partition function in the  thermodynamic translation). Most methods are highly dependent on how the sequence of β’s is chosen. A first nice result (for me) is that the Kullback-Leibler distance and the partition function are strongly related in that

which means that the variation in the normalising constant is driving the variation in the Kullback-Leibler distance. The next section goes into differential geometry and the remains from my Master course in differential geometry alas are much too scattered for me to even remember some notions like that of a bundle… So, like Andrew, I have trouble making sense of the resulting algorithm, which updates the temperature β along with the position and speed. (It sounds like an extra and corresponding energy term is added to the original Hamiltonian function.) Even the Beta-Binomial

example is somewhat too involved for me.  So I tried to write down the algorithm step by step in this special case. Which led to

  1. update β into β-εδp’²
  2. update p into p-εδp’
  3. update p’ into p’+ε{(1-a)/p+(b-1)/(1-p)}
  4. compute the average log-likelihood, λ* under the tempered version of the target (at temperature β)
  5. update p’ into p’+2εβ{(1-a)/p+(b-1)/(1-p)}-ε[λ-λ*]p’
  6. update p’ into p’+ε{(1-a)/p+(b-1)/(1-p)}
  7. update β into β-εδp’²
  8. update p into p-εδp’

where p’ denotes the momentum auxiliary variable associated with the kinetic energy. And λ is the current log-likelihood. (The parameter ε was equal to 0.005 and I could not find the value of δ.) The only costly step in the above list is the approximation of the log-likelihood average λ*. The above details make the algorithm quite clear but I am still missing the intuition behind…

Filed under: Books, Statistics, University life Tagged: acceleration of MCMC algorithms, differential geometry, Hamiltonian Monte Carlo, Riemann manifold, University of Warwick
Categories: Bayesian Bloggers

did I mean endemic? [pardon my French!]

Xian's Og - Wed, 2014-06-25 18:14

Deborah Mayo wrote a Saturday night special column on our Big Bayes stories issue in Statistical Science. She (predictably?) focussed on the critical discussions, esp. David Hand’s most forceful arguments where he essentially considers that, due to our (special issue editors’) selection of successful stories, we biased the debate by providing a “one-sided” story. And that we or the editor of Statistical Science should also have included frequentist stories. To which Deborah points out that demonstrating that “only” a frequentist solution is available may be beyond the possible. And still, I could think of partial information and partial inference problems like the “paradox” raised by Jamie Robbins and Larry Wasserman in the past years. (Not the normalising constant paradox but the one about censoring.) Anyway, the goal of this special issue was to provide a range of realistic illustrations where Bayesian analysis was a most reasonable approach, not to raise the Bayesian flag against other perspectives: in an ideal world it would have been more interesting to get discussants produce alternative analyses bypassing the Bayesian modelling but obviously discussants only have a limited amount of time to dedicate to their discussion(s) and the problems were complex enough to deter any attempt in this direction.

As an aside and in explanation of the cryptic title of this post, Deborah wonders at my use of endemic in the preface and at the possible mis-translation from the French. I did mean endemic (and endémique) in a half-joking reference to a disease one cannot completely get rid of. At least in French, the term extends beyond diseases, but presumably pervasive would have been less confusing… Or ubiquitous (as in Ubiquitous Chip for those with Glaswegian ties!). She also expresses “surprise at the choice of name for the special issue. Incidentally, the “big” refers to the bigness of the problem, not big data. Not sure about “stories”.” Maybe another occurrence of lost in translation… I had indeed no intent of connection with the “big” of “Big Data”, but wanted to convey the notion of a big as in major problem. And of a story explaining why the problem was considered and how the authors reached a satisfactory analysis. The story of the Air France Rio-Paris crash resolution is representative of that intent. (Hence the explanation for the above picture.)

Filed under: Books, Statistics, University life Tagged: Air France, Bayesian Analysis, censoring, endemic, Glasgow, guest editors, information theory, Larry Wasserman, Robins-Wasserman paradox, Statistical Science, translation, Ubiquitous Chip
Categories: Bayesian Bloggers

ABC model choice by random forests

Xian's Og - Tue, 2014-06-24 18:14

After more than a year of collaboration, meetings, simulations, delays, switches,  visits, more delays, more simulations, discussions, and a final marathon wrapping day last Friday, Jean-Michel Marin, Pierre Pudlo,  and I at last completed our latest collaboration on ABC, with the central arguments that (a) using random forests is a good tool for choosing the most appropriate model and (b) evaluating the posterior misclassification error rather than the posterior probability of a model is an appropriate paradigm shift. The paper has been co-signed with our population genetics colleagues, Jean-Marie Cornuet and Arnaud Estoup, as they provided helpful advice on the tools and on the genetic illustrations and as they plan to include those new tools in their future analyses and DIYABC software.  ABC model choice via random forests is now arXived and very soon to be submitted…

One scientific reason for this fairly long conception is that it took us several iterations to understand the intrinsic nature of the random forest tool and how it could be most naturally embedded in ABC schemes. We first imagined it as a filter from a set of summary statistics to a subset of significant statistics (hence the automated ABC advertised in some of my past or future talks!), with the additional appeal of an associated distance induced by the forest. However, we later realised that (a) further ABC steps were counterproductive once the model was selected by the random forest and (b) including more summary statistics was always beneficial to the performances of the forest and (c) the connections between (i) the true posterior probability of a model, (ii) the ABC version of this probability, (iii) the random forest version of the above, were at best very loose. The above picture is taken from the paper: it shows how the true and the ABC probabilities (do not) relate in the example of an MA(q) model… We thus had another round of discussions and experiments before deciding the unthinkable, namely to give up the attempts to approximate the posterior probability in this setting and to come up with another assessment of the uncertainty associated with the decision. This led us to propose to compute a posterior predictive error as the error assessment for ABC model choice. This is mostly a classification error but (a) it is based on the ABC posterior distribution rather than on the prior and (b) it does not require extra-computations when compared with other empirical measures such as cross-validation, while avoiding the sin of using the data twice!

Filed under: pictures, R, Statistics, Travel, University life Tagged: ABC, ABC model choice, arXiv, Asian lady beetle, CART, classification, DIYABC, machine learning, model posterior probabilities, Montpellier, posterior predictive, random forests, SNPs, using the data twice
Categories: Bayesian Bloggers

revenge of the pigeons

Xian's Og - Mon, 2014-06-23 18:14

While I had not had kamikaze pigeons hitting my windows for quite a while…, it may be that one of them decided to move to biological warfare: when I came back from Edinburgh, my office at the University was in a terrible state as a bird had entered through a tiny window opening and wrecked havoc on the room, dropping folders and rocks from my shelves and… leaving a most specific proof of its visit. This bird was particularly attracted by and aggressive against the above book, Implementing Reproducible Research, standing on top of my books to review for CHANCE. Obvious disclaimer: this reflects neither my opinion nor the University opinion about the book contents, but only the bird’s, which is solely responsible for its action!

Filed under: Books, Kids, pictures, R, Statistics, Travel, University life Tagged: book review, CHANCE, Edinburgh, pigeon, reproducible research, Université Paris Dauphine, window pane
Categories: Bayesian Bloggers

modern cosmology as a refutation of theism

Xian's Og - Sun, 2014-06-22 18:14

While I thought the series run by The Stone on the philosophy [or lack thereof] of religions was over, it seems there are more entries.  This week, I read with great pleasure the piece written by Tim Maudlin on the role played by recent results in (scientific) cosmology in refuting theist arguments.

“No one looking at the vast extent of the universe and the completely random location of homo sapiens within it (in both space and time) could seriously maintain that the whole thing was intentionally created for us.” T. Maudlin

What I particularly liked in his arguments is the role played by randomness, with an accumulation of evidence of the random nature and location of Earth and human beings, which and who appear more and more at the margins of the Universe rather than the main reason for its existence. And his clear rejection of the argument of fine-tuned cosmological constants as an argument in favour of the existence of a watchmaker. (Argument that was also deconstructed in Seber’s book.) And obviously his final paragraph that “Atheism is the default position in any scientific inquiry”. This may be the strongest entry in the whole series.

Filed under: Books Tagged: atheism, cosmology, The New York Times, The Stone, theism, Tim Maudlin

Categories: Bayesian Bloggers

20,000 pink ladies [10k: 37'26", 43rd & 3rd V2]

Xian's Og - Sat, 2014-06-21 18:14

This year was a special year for the races of Les Courants de la Liberté, in Caen, as part of the celebrations of the 70th anniversary of the Allied landing on the nearby D-Day beaches. The number of women running the Rochambelle race/walk against breast cancer was raised this year to 20,000 participants, an impressive pink wave riding the streets of Caen, incl. my wife, mother and mother-in-law! And even one of the 1944 Rochambelle nurses attending the start and finish of the race!

While I had no particular expectation for the 10k race, it went on so well that I ended up with my best time ever on this distance (my previous record was in Ottawa in July…1989!). The weather was perfect, cool and cloudy with a tailwind most of the way. (The low intensity training in Edinburgh and the Highlands may have helped.) I had a bit of an issue at the beginning passing the first rows of runners who were clearly in the wrong league but stuck with a runner most of the race, which helped with the middle hardest k’s (with a maximal 3:58 on the 8th k!), and  finished by motivating another V2 to keep up with, very glad to see my finish time. I actually ended up 3rd V2 just ahead of two other runners from this category, but there is no podium or reward for this in this race, given the large number of races to accommodate (ultra-trail of D-Day beaches, marathon, half-marathon, 10k, rollers, kids,…)

Filed under: Kids, Mountains, pictures, Running, Travel Tagged: 10k, Caen, D Day, La Rochambelle, Les Courants de la Liberté, Normandy, Ottawa, running, V2

Categories: Bayesian Bloggers


Xian's Og - Sat, 2014-06-21 13:14

Filed under: pictures Tagged: clouds, plane exhaust, Sceaux, summer, sunset, trees
Categories: Bayesian Bloggers

“those” coincidences

Xian's Og - Fri, 2014-06-20 18:14

Last Thursday night, after a friendly dinner closing the ICMS workshop, I was rushing back to Pollock Halls to catch some sleep before a very early flight. When crossing North Bridge, on top of Waverley station, I then spotted in the crowd a well-known face of a fellow statistician from Cambridge University, on an academic visit to the University of Edinburgh that was completely unrelated with the workshop. Then, today, on my way back from submitting a visa request at the Indian embassy in Paris, I took the RER train for one stop between Gare du Nord and Chatelet. When I stood up from my seat and looked behind me, a senior (and most famous) mathematician was sitting right there, in deep conversation with a colleague about algorithms… Just two of “those” coincidences. (Edinburgh may be propitious to coincidences: at the last ICMS workshop I attended, I ended up in the same Indian restaurant as Marc Suchard, who also was on an academic visit to the University of Edinburgh that was completely unrelated with the workshop!)

Filed under: pictures, Travel, University life Tagged: Bayes 250, Châtelet, coincidences, Edinburgh, Gare du Nord, ICMS, India, Indian food, Marc Suchard, Paris, RER, RER B, Scotland, The Balmoral, visa
Categories: Bayesian Bloggers

Vivons-nous pour être heureux ? [bacc. 2014]

Xian's Og - Thu, 2014-06-19 18:14

This year is my daughter’s final year in high school and she is now taking the dreaded baccalauréat exams. Just like a few hundred thousands French students. With “just like” in the strict sense since all students with the same major take the very same exam all over France… The first written composition is in the “mother of all disciplines”, philosophy, and the theme of one dissertation this year was “do we live to be happy?”. Which suited well my daughter as she was hoping for a question around that theme. She managed to quote Plato and Buddha, The Pursuit of Happiness and The Wolf of Wall-street… So sounded happy enough with her essay. This seemed indeed like a rather safe notion (as opposed to ethics, religion, politics or work), with enough material to fill a classical thesis-antithesis-synthesis plan (and my personal materialistic conclusion about the lack of predetermination in our lifes).

Filed under: Books, Kids Tagged: Baccalauréat, essay, exam, France, happiness, philosophy
Categories: Bayesian Bloggers

Bayes at the Bac’ [again]

Xian's Og - Thu, 2014-06-19 06:00

When my son took the mathematics exam of the baccalauréat a few years ago, the probability problem was a straightforward application of Bayes’ theorem.  (Problem which was later cancelled due to a minor leak…) Surprise, surprise, Bayes is back this year for my daughter’s exam. Once again, the topic is a pharmaceutical lab with a test, test with different positive rates on two populations (healthy vs. sick), and the very basic question is to derive the probability that a person is sick given the test is positive. Then a (predictable) application of the CLT-based confidence interval on a binomial proportion. And the derivation of a normal confidence interval, once again compounded by  a CLT-based confidence interval on a binomial proportion… Fairly straightforward with no combinatoric difficulty.

The other problems were on (a) a sequence defined by the integral

(b) solving the equation

in the complex plane and (c) Cartesian 2-D and 3-D geometry, again avoiding abstruse geometric questions… A rather conventional exam from my biased perspective.

Filed under: Kids, Statistics Tagged: Baccalauréat, Cartesian geometry, complex numbers, exam, high school, integrals, polynomials, sequence, Thomas Bayes

Categories: Bayesian Bloggers

the Poisson transform

Xian's Og - Wed, 2014-06-18 18:14

In obvious connection with an earlier post on the “estimation” of normalising constants, Simon Barthelmé and Nicolas Chopin just arXived a paper on The Poisson transform for unormalised statistical models. Obvious connection because I heard of the Guttmann and Hyvärinen (2012) paper when Simon came to CREST to give a BiP talk on this paper a few weeks ago. (A connected talk he gave in Banff is available as a BIRS video.)

Without getting too much into details, the neat idea therein is to turn the observed likelihood

into a joint likelihood

 which is the likelihood of a Poisson point process with intensity function

This is an alternative model in that the original likelihood does not appear as a marginal of the above. Only the modes coincide, with the conditional mode in ν providing the normalising constant. In practice, the above Poisson process likelihood is unavailable and Guttmann and Hyvärinen (2012) offer an approximation by means of their logistic regression.

Unavailable likelihoods inevitably make me think of ABC. Would ABC solutions be of interest there? In particular, could the Poisson point process be simulated with no further approximation? Since the “true” likelihood is not preserved by this representation, similar questions to those found in ABC arise, like a measure of departure from the “true” posterior. Looking forward the Bayesian version! (Marginalia: Siméon Poisson died in Sceaux, which seemed to have attracted many mathematicians at the time, since Cauchy also spent part of his life there…)

Filed under: Books, Kids, pictures, Statistics, University life Tagged: Banff, Cauchy, logistic regression, Paris, Poisson point process, Sceaux, Siméon Poisson, untractable normalizing constant
Categories: Bayesian Bloggers

last Big MC [seminar] before summer [June 19, 3pm]

Xian's Og - Tue, 2014-06-17 08:18

Last session of our Big’MC seminar at Institut Henri Poincaré this year, on Tuesday Thursday, June 19, with

Chris Holmes (Oxford) at 3pm on

Robust statistical decisions via re-weighted Monte Carlo samples

and Pierre Pudlo (iC3M, Université de Montpellier 2) at 4:15pm on [our joint work]

ABC and machine learning

Filed under: pictures, Statistics, University life Tagged: ABC, Big'MC, Chris Holmes, IHP, Institut Henri Poincaré, machine learning, Monte Carlo s, Montpellier, Paris, Pierre Pudlo, seminar, University of Oxford
Categories: Bayesian Bloggers

Le Monde sans puzzle

Xian's Og - Mon, 2014-06-16 18:14

This week, Le Monde mathematical puzzle: is purely geometric, hence inappropriate for an R resolution. In the Science & Médecine leaflet, there is however an interesting central page about random generators, from the multiple usages of those in daily life to the consequences of poor generators on cryptography and data safety. The article is compiling an interview of Jean-Paul Delahaye on the topic with recent illustrations from cybersecurity. One final section gets rather incomprehensible: when discussing the dangers of seed generation, it states that “a poor management of the entropy means that an hacker can saturate the seed and take over the original randomness, weakening the whole system”. I am sure there is something real behind the imagery, but this does not make sense… Another insert mentions a possible random generator built out of the light detectors on a smartphone. And quantum physics. The society IDQ can indeed produce ultra-rapid random generators that way. And it also ran randomness tests summarised here. Using in particular George Marsaglia’s diehard battery.

Another column report that a robot passed the Turing test last week, on Turing‘s death anniversary. Meaning that 33% of the jury was convinced the robot’s answers were given by a human. This reminded me of the Most Human Human book sent to me by my friends from BYU. (A marginalia found in Le Monde is that the test was organised by Kevin Warwick…from the University of Coventry, a funny reversal of the University of Warwick sitting in Coventry! However, checking on his website showed that he has and had no affiliation with this university, being at the University of Reading instead.)


Filed under: Books, Kids, Statistics, University life Tagged: Alan Turing, DieHard, George Marsaglia, Jean-Paul Delahaye, Le Monde, mathematical puzzle, pseudo-random generator, randomness
Categories: Bayesian Bloggers

Statistical modeling and computation [apologies]

Xian's Og - Wed, 2014-06-11 03:55

In my book review of the recent book by Dirk Kroese and Joshua Chan,  Statistical Modeling and Computation, I mistakenly and persistently typed the name of the second author as Joshua Chen. This typo alas made it to the printed and on-line versions of the subsequent CHANCE 27(2) column. I am thus very much sorry for this mistake of mine and most sincerely apologise to the authors. Indeed, it always annoys me to have my name mistyped (usually as Roberts!) in references.  [If nothing else, this typo signals it is high time for a change of my prescription glasses.]

Filed under: Books, R, Statistics, University life Tagged: apologies, Australia, Bayesian statistics, Dirk Kroese, introductory textbooks, Joshua Chan, Monte Carlo methods, Monte Carlo Statistical Methods, R, state space model, typo
Categories: Bayesian Bloggers

checking for finite variance of importance samplers

Xian's Og - Tue, 2014-06-10 18:14

Over a welcomed curry yesterday night in Edinburgh I read this 2008 paper by Koopman, Shephard and Creal, testing the assumptions behind importance sampling, which purpose is to check on-line for (in)finite variance in an importance sampler, based on the empirical distribution of the importance weights. To this goal, the authors use the upper tail  of the weights and a limit theorem that provides the limiting distribution as a type of Pareto distribution

over (0,∞). And then implement a series of asymptotic tests like the likelihood ratio, Wald and score tests to assess whether or not the power ξ of the Pareto distribution is below ½. While there is nothing wrong with this approach, which produces a statistically validated diagnosis, I still wonder at the added value from a practical perspective, as raw graphs of the estimation sequence itself should exhibit similar jumps and a similar lack of stabilisation as the ones seen in the various figures of the paper. Alternatively, a few repeated calls to the importance sampler should disclose the poor convergence properties of the sampler, as in the above graph. Where the blue line indicates the true value of the integral.

Filed under: R, Statistics, Travel, University life Tagged: Abraham Wald, curry, Edinburgh, extreme value theory, importance sampling, infinite variance estimators, Pareto distribution, score function, Scotland
Categories: Bayesian Bloggers