Bayesian News Feeds

Adaptive Priors based on Splines with Random Knots

Eduard Belitser and Paulo Serra
Categories: Bayesian Analysis

Stratified Graphical Models - Context-Specific Independence in Graphical Models

Henrik Nyman, Johan Pensar, Timo Koski, Jukka Corander
Categories: Bayesian Analysis

recycling accept-reject rejections

Xian's Og - Mon, 2014-06-30 18:14

Vinayak Rao, Lizhen Lin and David Dunson just arXived a paper which proposes anew technique to handle intractable normalising constants. And which exact title is Data augmentation for models based on rejection sampling. (Paper that I read in the morning plane to B’ham, since this is one of my weeks in Warwick.) The central idea therein is that, if the sample density (aka likelihood) satisfies

where all terms but p are known in closed form, then completion by the rejected values of an hypothetical accept-reject algorithm−hypothetical in the sense that the data does not have to be produced by an accept-reject scheme but simply the above domination condition to hold−allows for a data augmentation scheme. Without requiring the missing normalising constant. Since the completed likelihood is

A closed-form, if not necessarily congenial, function.

Now this is quite a different use of the “rejected values” from the accept reject algorithm when compared with our 1996 Biometrika paper on the Rao-Blackwellisation of accept-reject schemes (which, still, could have been mentioned there… Or Section 4.2 of Monte Carlo Statistical Methods. Rather than re-deriving the joint density of the augmented sample, “accepted+rejected”.)

It is a neat idea in that it completely bypasses the approximation of the normalising constant. And avoids the somewhat delicate tuning of the auxiliary solution of Moller et al. (2006)  The difficulty with this algorithm is however in finding an upper bound M on the unnormalised density f that is

  1. in closed form;
  2. with a manageable and tight enough “constant” M;
  3. compatible with running a posterior simulation conditional on the added rejections.

The paper seems to assume further that the bound M is independent from the current parameter value θ, at least as suggested by the notation (and Theorem 2), but this is not in the least necessary for the validation of the formal algorithm. Such a constraint would pull M higher, hence reducing the efficiency of the method. Actually the matrix Langevin distribution considered in the first example involves a bound that depends on the parameter κ.

The paper includes a result (Theorem 2) on the uniform ergodicity that relies on heavy assumptions on the proposal distribution. And a rather surprising one, namely that the probability of rejection is bounded from below, i.e. calling for a less efficient proposal. Now it seems to me that a uniform ergodicity result holds as well when the probability of acceptance is bounded from below since, then, the event when no rejection occurs constitutes an atom from the augmented Markov chain viewpoint. There therefore occurs a renewal each time the rejected variable set ϒ is empty, and ergodicity ensues (Robert, 1995, Statistical Science).

Note also that, despite the opposition raised by the authors, the method per se does constitute a pseudo-marginal technique à la Andrieu-Roberts (2009) since the independent completion by the (pseudo) rejected variables produces an unbiased estimator of the likelihood. It would thus be of interest to see how the recent evaluation tools of Andrieu and Vihola can assess the loss in efficiency induced by this estimation of the likelihood.

Maybe some further experimental evidence tomorrow…

Filed under: Statistics, University life Tagged: accept-reject algorithm, arXiv, auxiliary variable, Data augmentation, George Casella, intractable likelihood, Monte Carlo Statistical Methods, Rao-Blackwellisation, recycling, untractable normalizing constant
Categories: Bayesian Bloggers

R/Rmetrics in Paris [alas!]

Xian's Og - Sun, 2014-06-29 18:14

Today I gave a talk on Bayesian model choice in a fabulous 13th Century former monastery in the Latin Quarter of Paris… It is the Collège des Bernardins, close to Jussieu and Collège de France, unbelievably hidden to the point I was not aware of its existence despite having studied and worked in Jussieu since 1982… I mixed my earlier San Antonio survey on importance sampling approximations to Bayes factors with an entry to our most recent work on ABC with random forests. This was the first talk of the 8th R/Rmetrics workshop taking place in Paris this year. (Rmetrics is aiming at aggregating R packages with econometrics and finance applications.) And I had a full hour and a half to deliver my lecture to the workshop audience. Nice place, nice people, new faces and topics (and even andouille de Vire for lunch!): why should I complain with an alas in the title?!What happened is that the R/Rmetrics meetings have been till this year organised in Meielisalp, Switzerland. Which stands on top of Thuner See and… just next to the most famous peaks of the Bernese Alps! And that I had been invited last year but could not make it… Meaning I lost a genuine opportunity to climb one of my five dream routes, the Mittelegi ridge of the Eiger. As the future R/Rmetrics meetings will not take place there.

A lunch discussion at the workshop led me to experiment the compiler library in R, library that I was unaware of. The impact on the running time is obvious: recycling the fowler function from the last Le Monde puzzle,

> bowler=cmpfun(fowler) > N=20;n=10;system.time(fowler(pred=N)) user system elapsed 52.647 0.076 56.332 > N=20;n=10;system.time(bowler(pred=N)) user system elapsed 51.631 0.004 51.768 > N=20;n=15;system.time(bowler(pred=N)) user system elapsed 51.924 0.024 52.429 > N=20;n=15;system.time(fowler(pred=N)) user system elapsed 52.919 0.200 61.960

shows a ten- to twenty-fold gain in system time, if not in elapsed time (re-alas!).

Filed under: Mountains, pictures, R, Statistics, Travel, University life Tagged: ABC, andouille de Vire, Bayesian econometrics, Bayesian model choice, Bernese Alps, cmpfun(), Collège des Bernardins, compiler, Eiger, importance sampling, Interlaken, Meielisalp, Mittelegi ridge, Paris, R, Rmetrics, San Antonio, Switzerland, system.time, Thun Lake
Categories: Bayesian Bloggers

…et sinon Mr Cardoso réussit là où les ont échoué!

Xian's Og - Sun, 2014-06-29 08:18

A similar flier, a few days later. With very precise (if incoherent) guarantees! And a fantastic use of capitals. Too bad Monsieur Cardoso could not predict the (occurrence of a) missing noun in the last sentence…

Filed under: Kids, pictures Tagged: advertising, charlatanism, mailbox, Pantin, Paris
Categories: Bayesian Bloggers

Prof. Ntayiya résout tous vos problèmes!

Xian's Og - Sat, 2014-06-28 18:14

One of the numerous fliers for “occult expertise” that end up in my mailbox… An interesting light on the major woes of my neighbours. That can induce some to consult with such charlatans. And a wonder: how can Professeur Ntayiya hope to get paid if the effects need to be proven?!

Filed under: pictures Tagged: black block, charlatanism, fliers, Fontenay-aux-Roses, mailbox
Categories: Bayesian Bloggers

Le Monde puzzle [#872]

Xian's Og - Fri, 2014-06-27 18:14

An “mildly interesting” Le Monde mathematical puzzle that eventually had me running R code on a cluster:

Within the set {1,…,56}, take 12 values at random, x1,…,x12. Is it always possible to pick two pairs from those 12 balls such that their sums are equal?

Indeed, while exhaustive search cannot reach the size of the set,

fowler=function(i=1,pred=NULL){ pred=c(pred,i) for (j in (1:N)[-pred]){ a=outer(c(pred,j),c(pred,j),"+") if ((sum(duplicated(a[lower.tri(a)]))>0)){ val=FALSE }else{ if (length(pred)==n-1){ print(c(pred,j)) val=TRUE }else{ val=fowler(j,pred)}} if (val) break() } return(val) } fowler(i=N,pred=1)

with N=35 being my upper limit (and n=9 the largest value inducing double sums), the (second) easiest verification goes by sampling as indicated and checking for duplicates.

mindup=66 for (t in 1:10^7){ #arguing that extremes should be included x=c(1,56,sample(2:55,10)) A=outer(x,x,"+") mindup=min(mindup,sum(duplicated(A[lower.tri(A)]))) if (mindup==0) break()}

The values of mindup obtained by running this code a few times are around 5, which means a certain likelihood of a positive answer to the above question…

This problem raises a much more interesting question, namely how to force simulations of those 12-uplets towards the most extreme values of the target function, from simulated annealing to cross-entropy to you-name-it… Here is my simulated annealing attempt:

target=function(x){ a=outer(x,x,"+") return(sum(duplicated(a[lower.tri(a)])))} beta=100 Nmo=N-1 nmt=n-2 nmo=n-1 x=sort(sample(2:Nmo,nmt)) cur=c(1,x,N) tarcur=target(cur) for (t in 1:10^6){ dex=sample(2:nmo,2) prop=sort(c(cur[-dex],sample((2:Nmo)[-(cur-1)],2))) tarprop=target(prop) if (beta*log(runif(1))<tarprop -tarcur){ cur=prop;tarcur=tarprop} beta=beta*.9999 if (tarcur==0) break()}

Apart from this integer programming exercise, a few items of relevance in this Le Monde Science & Medicine leaflet.  A portrait of Leslie Lamport for his Turing Prize (yes, the very same Leslie Lamport who created LaTeX!, and wrote this book which stood on most mathematicians’ bookshelves for decades, with the marginally annoying lion comics at the head of each chapter!). A tribune on an interesting book, The Beginning and the End, by Clément Vidal, discussing how to prepare for the end of the Universe by creating a collective mind. And the rise of biobanks…

Filed under: Books, Kids, Statistics, University life Tagged: LaTeX, Le Monde, Leslie Lamport, lions, mathematical puzzle, Tring Prize
Categories: Bayesian Bloggers

thermodynamic Monte Carlo

Xian's Og - Thu, 2014-06-26 18:14

Michael Betancourt, my colleague from Warwick, arXived a month ago a paper about a differential geometry approach to relaxation. (In the Monte Carlo rather than the siesta sense of the term relaxation!) He is considering the best way to link a simple base measure ϖ to a measure of interest π by the sequence

where Z(β) is the normalising constant (or partition function in the  thermodynamic translation). Most methods are highly dependent on how the sequence of β’s is chosen. A first nice result (for me) is that the Kullback-Leibler distance and the partition function are strongly related in that

which means that the variation in the normalising constant is driving the variation in the Kullback-Leibler distance. The next section goes into differential geometry and the remains from my Master course in differential geometry alas are much too scattered for me to even remember some notions like that of a bundle… So, like Andrew, I have trouble making sense of the resulting algorithm, which updates the temperature β along with the position and speed. (It sounds like an extra and corresponding energy term is added to the original Hamiltonian function.) Even the Beta-Binomial

example is somewhat too involved for me.  So I tried to write down the algorithm step by step in this special case. Which led to

  1. update β into β-εδp’²
  2. update p into p-εδp’
  3. update p’ into p’+ε{(1-a)/p+(b-1)/(1-p)}
  4. compute the average log-likelihood, λ* under the tempered version of the target (at temperature β)
  5. update p’ into p’+2εβ{(1-a)/p+(b-1)/(1-p)}-ε[λ-λ*]p’
  6. update p’ into p’+ε{(1-a)/p+(b-1)/(1-p)}
  7. update β into β-εδp’²
  8. update p into p-εδp’

where p’ denotes the momentum auxiliary variable associated with the kinetic energy. And λ is the current log-likelihood. (The parameter ε was equal to 0.005 and I could not find the value of δ.) The only costly step in the above list is the approximation of the log-likelihood average λ*. The above details make the algorithm quite clear but I am still missing the intuition behind…

Filed under: Books, Statistics, University life Tagged: acceleration of MCMC algorithms, differential geometry, Hamiltonian Monte Carlo, Riemann manifold, University of Warwick
Categories: Bayesian Bloggers

did I mean endemic? [pardon my French!]

Xian's Og - Wed, 2014-06-25 18:14

Deborah Mayo wrote a Saturday night special column on our Big Bayes stories issue in Statistical Science. She (predictably?) focussed on the critical discussions, esp. David Hand’s most forceful arguments where he essentially considers that, due to our (special issue editors’) selection of successful stories, we biased the debate by providing a “one-sided” story. And that we or the editor of Statistical Science should also have included frequentist stories. To which Deborah points out that demonstrating that “only” a frequentist solution is available may be beyond the possible. And still, I could think of partial information and partial inference problems like the “paradox” raised by Jamie Robbins and Larry Wasserman in the past years. (Not the normalising constant paradox but the one about censoring.) Anyway, the goal of this special issue was to provide a range of realistic illustrations where Bayesian analysis was a most reasonable approach, not to raise the Bayesian flag against other perspectives: in an ideal world it would have been more interesting to get discussants produce alternative analyses bypassing the Bayesian modelling but obviously discussants only have a limited amount of time to dedicate to their discussion(s) and the problems were complex enough to deter any attempt in this direction.

As an aside and in explanation of the cryptic title of this post, Deborah wonders at my use of endemic in the preface and at the possible mis-translation from the French. I did mean endemic (and endémique) in a half-joking reference to a disease one cannot completely get rid of. At least in French, the term extends beyond diseases, but presumably pervasive would have been less confusing… Or ubiquitous (as in Ubiquitous Chip for those with Glaswegian ties!). She also expresses “surprise at the choice of name for the special issue. Incidentally, the “big” refers to the bigness of the problem, not big data. Not sure about “stories”.” Maybe another occurrence of lost in translation… I had indeed no intent of connection with the “big” of “Big Data”, but wanted to convey the notion of a big as in major problem. And of a story explaining why the problem was considered and how the authors reached a satisfactory analysis. The story of the Air France Rio-Paris crash resolution is representative of that intent. (Hence the explanation for the above picture.)

Filed under: Books, Statistics, University life Tagged: Air France, Bayesian Analysis, censoring, endemic, Glasgow, guest editors, information theory, Larry Wasserman, Robins-Wasserman paradox, Statistical Science, translation, Ubiquitous Chip
Categories: Bayesian Bloggers

ABC model choice by random forests

Xian's Og - Tue, 2014-06-24 18:14

After more than a year of collaboration, meetings, simulations, delays, switches,  visits, more delays, more simulations, discussions, and a final marathon wrapping day last Friday, Jean-Michel Marin, Pierre Pudlo,  and I at last completed our latest collaboration on ABC, with the central arguments that (a) using random forests is a good tool for choosing the most appropriate model and (b) evaluating the posterior misclassification error rather than the posterior probability of a model is an appropriate paradigm shift. The paper has been co-signed with our population genetics colleagues, Jean-Marie Cornuet and Arnaud Estoup, as they provided helpful advice on the tools and on the genetic illustrations and as they plan to include those new tools in their future analyses and DIYABC software.  ABC model choice via random forests is now arXived and very soon to be submitted…

One scientific reason for this fairly long conception is that it took us several iterations to understand the intrinsic nature of the random forest tool and how it could be most naturally embedded in ABC schemes. We first imagined it as a filter from a set of summary statistics to a subset of significant statistics (hence the automated ABC advertised in some of my past or future talks!), with the additional appeal of an associated distance induced by the forest. However, we later realised that (a) further ABC steps were counterproductive once the model was selected by the random forest and (b) including more summary statistics was always beneficial to the performances of the forest and (c) the connections between (i) the true posterior probability of a model, (ii) the ABC version of this probability, (iii) the random forest version of the above, were at best very loose. The above picture is taken from the paper: it shows how the true and the ABC probabilities (do not) relate in the example of an MA(q) model… We thus had another round of discussions and experiments before deciding the unthinkable, namely to give up the attempts to approximate the posterior probability in this setting and to come up with another assessment of the uncertainty associated with the decision. This led us to propose to compute a posterior predictive error as the error assessment for ABC model choice. This is mostly a classification error but (a) it is based on the ABC posterior distribution rather than on the prior and (b) it does not require extra-computations when compared with other empirical measures such as cross-validation, while avoiding the sin of using the data twice!

Filed under: pictures, R, Statistics, Travel, University life Tagged: ABC, ABC model choice, arXiv, Asian lady beetle, CART, classification, DIYABC, machine learning, model posterior probabilities, Montpellier, posterior predictive, random forests, SNPs, using the data twice
Categories: Bayesian Bloggers

revenge of the pigeons

Xian's Og - Mon, 2014-06-23 18:14

While I had not had kamikaze pigeons hitting my windows for quite a while…, it may be that one of them decided to move to biological warfare: when I came back from Edinburgh, my office at the University was in a terrible state as a bird had entered through a tiny window opening and wrecked havoc on the room, dropping folders and rocks from my shelves and… leaving a most specific proof of its visit. This bird was particularly attracted by and aggressive against the above book, Implementing Reproducible Research, standing on top of my books to review for CHANCE. Obvious disclaimer: this reflects neither my opinion nor the University opinion about the book contents, but only the bird’s, which is solely responsible for its action!

Filed under: Books, Kids, pictures, R, Statistics, Travel, University life Tagged: book review, CHANCE, Edinburgh, pigeon, reproducible research, Université Paris Dauphine, window pane
Categories: Bayesian Bloggers

modern cosmology as a refutation of theism

Xian's Og - Sun, 2014-06-22 18:14

While I thought the series run by The Stone on the philosophy [or lack thereof] of religions was over, it seems there are more entries.  This week, I read with great pleasure the piece written by Tim Maudlin on the role played by recent results in (scientific) cosmology in refuting theist arguments.

“No one looking at the vast extent of the universe and the completely random location of homo sapiens within it (in both space and time) could seriously maintain that the whole thing was intentionally created for us.” T. Maudlin

What I particularly liked in his arguments is the role played by randomness, with an accumulation of evidence of the random nature and location of Earth and human beings, which and who appear more and more at the margins of the Universe rather than the main reason for its existence. And his clear rejection of the argument of fine-tuned cosmological constants as an argument in favour of the existence of a watchmaker. (Argument that was also deconstructed in Seber’s book.) And obviously his final paragraph that “Atheism is the default position in any scientific inquiry”. This may be the strongest entry in the whole series.

Filed under: Books Tagged: atheism, cosmology, The New York Times, The Stone, theism, Tim Maudlin

Categories: Bayesian Bloggers