Xian's Og

Syndicate content Xi'an's Og
an attempt at bloggin, nothing more...
Updated: 1 week 1 hour ago

snapshot from Louvain

Mon, 2015-10-26 13:18
Categories: Bayesian Bloggers

adaptive and delayed MCMC for expensive likelihoods

Sun, 2015-10-25 18:15

Chris Sherlock, Andrew Golightly and Daniel Henderson recently arXived a paper on a new kind of delayed acceptance.

“With simplicity in mind, we focus on a k-nearest neighbour regression model as the cheap surrogate.”

The central notion in the paper is to extrapolate from values of the likelihoods at a few points in the parameter space towards the whole space through a k-nearest neighbour estimate. While this solution is simple and relatively cheap to compute, it is unclear it is a good surrogate because it does not account for the structure of the model while depending on the choice of a distance. Recent works on Gaussian process approximations seem more relevant. See e.g. papers by Ed Meeds and Max Welling, or by Richard Wilkinson for ABC versions. Obviously, because this is a surrogate only for the first stage delayed acceptance (while the second stage is using the exact likelihood, as in our proposal), the approximation does not have to be super-tight. It should also favour the exploration of tails since (a) any proposal θ outside the current support of the chain is allocated a surrogate value that is the average of its k neighbours, hence larger than the true value in the tails, and (b) due to the delay a larger scale can be used in the random walk proposal. As the authors acknowledge, the knn method deteriorates quickly with the dimension. And computing the approximation grows with the number of MCMC iterations, given that the algorithm is adaptive and uses the exact likelihood values computed so far. Only for the first stage approximation, though, which explains “why” the delayed acceptance algorithm converges. I wondered for a short while whether this was enough to justify convergence, given that the original Metropolis-Hastings probability is just broken into two parts. Since the second stage compensates for the use of a surrogate on the first step, it should not matter in the end. However, the rejection of a proposal still depends on this approximation, i.e., differs from the original algorithm, and hence is turning the Markov chain into a non-Markovian process.

“The analysis sheds light on how computationally cheap the deterministic approximation needs to be to make its use worthwhile and on the relative importance of it matching the `location’ and curvature of the target.”

I had missed the “other” paper by some of the authors on the scaling of delayed acceptance, where they “assume that the error in the cheap deterministic approximation is a realisation of a random function” (p.3).  In which they provide an optimal scaling result for high dimensions à la Roberts et al. (1997), namely a scale of 2.38 (times the target scale) in the random walk proposal. The paper however does not describe the cheap approximation to the target or pseudo-marginal version.

A large chunk of the paper is dedicated to the construction and improvement of the KD-tree used to find the k nearest neighbours. In O(d log(n)) time. Algorithm on which I have no specific comment. Except maybe that the construction of a KD-tree in accordance with a Mahalanobis distance discussed in Section 2.1 requires that the MCMC algorithm has properly converged, which is unrealistic. And also that the construction of a balanced tree seems to require heavy calibrations.

The paper is somewhat harder to read than need be (?) because the authors cumulate the idea of delayed acceptance based on this knn approximation with the technique of pseudo-marginal Metropolis-Hastings. While there is an added value in doing so it complexifies the exposition. And leads to ungainly acronyms like adaptive “da-PsMMH”, which simply are un-readable (!).

I would suggest some material to be published as supplementary material and the overall length of the paper to be reduced. For instance, Section 4.2 is not particularly conclusive. See, e.g., Theorem 2. Or the description of the simulated models in Section 5, which is sometimes redundant.

Filed under: Books, Statistics Tagged: acronym, adaptive MCMC methods, delayed acceptance, k-nearest neighbours, Markov chain, MCMC, Monte Carlo Statistical Methods, pseudo-marginal MCMC, simulation
Categories: Bayesian Bloggers

battle of Agincourt, 600th commemoration [in miniature]

Sun, 2015-10-25 08:15

For the 600th commemoration of the battle of Agincourt, the National Museum of Arms and Armours [within the Tower of London] has commissioned a diorama of the battle that saw the French nobility decimated by English longbows, muddy fields, their heavy armours, and lust for glory…

The Royal Armouries have a blog describing the construction of this impressive diorama, which measures 4m x 2m and involves 4,400 figurines. And over 1,000 arrows stuck into the ground. Never forget the arrows!

At a personal level, besides a fascination for diorama [that started when I saw a D-Day diorama in one Norman museum as a kid] I had been told after The Accident that cutting some fingers was customary for captured bowmen, but this entry in Wikipedia writes it off as a myth.

Filed under: Books, Kids, pictures, Travel Tagged: Agincourt, armour, battle, bowmen, cavalry, diorama, England, figurines, France, Henry V, Hundred Years' War, National Museum of Arms and Armours, October 25, tin soldiers, Tower of London
Categories: Bayesian Bloggers

Le Monde puzzle [#934]

Sat, 2015-10-24 17:15

Another Le Monde mathematical puzzle with no R code:

Given a collection of 2€ coins and 5€ bills that sum up to 120€, find the number of 5€ bills such that the collection cannot be divided into 5 even parts.

Indeed, as soon as one starts formalising the problem, it falls apart: if there are a 5€ bills and b 2€ coins, we have 5a+2b=120, hence 2b=120-5a=5(24-a), meaning that b must be a multiple of 5, b=5b’ and a must be even, a=2a’, with b’=12-a’.  Hence, 10 possible values for both pairs (a’,b’) and (a,b), since a>0 and b>0. If these 120 Euros can be evenly divided between 5 persons, each gets 24€. Now, 24€ can be decomposed in 5€ bills and 2€ coins in three ways:


Each of the five persons using any of the 3 above decompositions means there exist integers α, β, and γ such that


with α+β+γ=5; therefore a=4α+2γ and b=2α+12β+7γ, which implies 2α+γ=a’ and 2α+12β+87γ=5×12-5a’=2α+5×12-12α-12γ+7γ, or 5a’=10α+5γ. That is, 2α+γ=a’ again… If a’=11, there is no solution with α+γ≤5, and this is the only such case. For any other value of a’, there is a way to divide the 120€ in 5 even parts. As often, I wonder at the point of the puzzle if this is the answer or at its phrasing if I have misunderstood the question.

Just to check the above by R means, I still wrote a short R code

for (a in 1:11){ # find integer solutions to 2x+y=a sum=0;z=-1 while ((z<a)&(z<6)&(sum<2)){ z=z+1;x=trunc((a-z)/2);y=5-x-z sum=(2*a==4*x+2*z)+(5*(11-a)==x+11*y+6*z)} print(c(2*a,5*(11-a),x,y,z)) }

which returned

[1] 2 50 0 4 1 [1] 4 45 1 4 0 [1] 6 40 1 3 1 [1] 8 35 2 3 0 [1] 10 30 2 2 1 [1] 12 25 3 2 0 [1] 14 20 3 1 1 [1] 16 15 4 1 0 [1] 18 10 4 0 1 [1] 20 5 5 0 0 [1] 22 0 5 -1 1

meaning that a’=11 does not produce a viable solution.

Filed under: Books, Kids, pictures, Statistics, Travel, University life Tagged: arithmetics, bills, Le Monde, mathematical puzzle, R
Categories: Bayesian Bloggers

Trip to Louvain (and back)

Fri, 2015-10-23 17:15

Apart from the minor initial inconvenience that I missed my train to Brussels thanks to the SNCF train company dysfunctional automata [but managed to switch to one half-an-hour later], my Belgian trip to Louvain-la-Neuve was quite enjoyable! I met with several local faculty [UCL] members I had not seen for several years, I gave my talk for the World Statistics Day in front of a large audience, maybe not the most appropriate talk for that day since it was somewhat skeptical about the nature of statistical tests, I got sharp questions, comments, and suggestions on the mixture approach to testing [incl. a challenging one about the Bernoulli B(p) case], I had a superb and animated and friendly dinner in a local restaurant—where everyone kindly spoke French although I was the only native French speaker—, I met the next morning with two PhD students from KU Leuven (the “other” part of the former Leuven university, albeit in the Flemmish side of the border) about functional ABC and generalised Jeffreys priors, I had a few more interesting discussions, and I managed to grab a few bags of Belgian waffles in Brussels before heading home! (In case you wonder from the above pixture, the crowds in the pedestrian streets of Louvain-la-Neuve were not connected to my visit!, but to a student festival centred at beer a 24 hour bike relay that attracted around 50,000 students, for less than a hundred bikes!)

Filed under: Kids, pictures, Running, Statistics, Travel, University life, Wines Tagged: Bayesian tests of hypotheses, Belgian beer, Belgique, bike, bracelet, festival, Louvain-la-Neuve, UCL, waffles, Wallonie, World Statistics Day
Categories: Bayesian Bloggers

model selection and multiple testing

Thu, 2015-10-22 17:15

Ritabrata Dutta, Malgorzata Bogdan and Jayanta Ghosh recently arXived a survey paper on model selection and multiple testing. Which provides a good opportunity to reflect upon traditional Bayesian approaches to model choice. And potential alternatives. On my way back from Madrid, where I got a bit distracted when flying over the South-West French coast, from Biarritz to Bordeaux. Spotting the lake of Hourtain, where I spent my military training month, 29 years ago!

“On the basis of comparison of AIC and BIC, we suggest tentatively that model selection rules should be used for the purpose for which they were introduced. If they are used for other problems, a fresh justification is desirable. In one case, justification may take the form of a consistency theorem, in the other some sort of oracle inequality. Both may be hard to prove. Then one should have substantial numerical assessment over many different examples.”

The authors quickly replace the Bayes factor with BIC, because it is typically consistent. In the comparison between AIC and BIC they mention the connundrum of defining a prior on a nested model from the prior on the nesting model, a problem that has not been properly solved in my opinion. The above quote with its call to a large simulation study reminded me of the paper by Arnold & Loeppky about running such studies through ecdfs. That I did not see as solving the issue. The authors also discuss DIC and Lasso, without making much of a connection between those, or with the above. And then reach the parametric empirical Bayes approach to model selection exemplified by Ed George’s and Don Foster’s 2000 paper. Which achieves asymptotic optimality for posterior prediction loss (p.9). And which unifies a wide range of model selection approaches.

A second part of the survey considers the large p setting, where BIC is not a good approximation to the Bayes factor (when testing whether or not all mean entries are zero). And recalls that there are priors ensuring consistency for the Bayes factor in this very [restrictive] case. Then, in Section 4, the authors move to what they call “cross-validatory Bayes factors”, also known as partial Bayes factors and pseudo-Bayes factors, where the data is split to (a) make the improper prior proper and (b) run the comparison or test on the remaining data. They also show the surprising result that, provided the fraction of the data used to proper-ise the prior does not converge to one, the X validated Bayes factor remains consistent [for the special case above]. The last part of the paper concentrates on multiple testing but is more tentative and conjecturing about convergence results, centring on the differences between full Bayes and empirical Bayes. Then the plane landed in Paris and I stopped my reading, not feeling differently about the topic than when the plane started from Madrid.

Filed under: Books, pictures, Statistics, Travel, University life Tagged: AIC, Bayes factor, Bayesian hypothesis testing, Bayesian model choice, Biarritz, BIC, Bordeaux, consistency, DIC, empirical Bayes, Hourtain, Madrid, pseudo-priors, The Bayesian Choice
Categories: Bayesian Bloggers

AISTATS 2016 [post-submissions]

Wed, 2015-10-21 17:15

Now that the deadline for AISTATS 2016 submissions is past, I can gladly report that we got the amazing number of 559 submissions, which is much more than what was submitted to the previous AISTATS conferences. To the point it made us fear for a little while [but not any longer!] that the conference room was not large enough. And hope that we had to install video connections in the hotel bar!

Which also means handling about the same amount of papers as a year of JRSS B submissions within a single month!, the way those submissions are handled for the AISTATS 2016 conference proceedings. The process is indeed [as in other machine learning conferences] to allocate papers to associate editors [or meta-reviewers or area chairs] with a bunch of papers and then have those AEs allocate papers to reviewers, all this within a few days, as the reviews have to be returned to authors within a month, for November 16 to be precise. This sounds like a daunting task but it proceeded rather smoothly due to a high degree of automation (this is machine-learning, after all!) in processing those papers, thanks to (a) the immediate response to the large majority of AEs and reviewers involved, who bid on the papers that were of most interest to them, and (b) a computer program called the Toronto Paper Matching System, developed by Laurent Charlin and Richard Zemel. Which tremendously helps with managing about everything! Even when accounting for the more formatted entries in such proceedings (with an 8 page limit) and the call to the conference participants for reviewing other papers, I remain amazed at the resulting difference in the time scales for handling papers in the fields of statistics and machine-learning. (There was a short lived attempt to replicate this type of processing for the Annals of Statistics, if I remember well.)

Filed under: Books, pictures, Statistics, Travel, University life Tagged: AISTATS 2014, AISTATS 2016, Andalucía, Artificial Intelligence and Statistics, Cadiz, conferences, editor, machine learning, proceedings, refereeing, scientific journals, Spain
Categories: Bayesian Bloggers