## Bayesian News Feeds

### snapshot from Louvain

### adaptive and delayed MCMC for expensive likelihoods

**C**hris Sherlock, Andrew Golightly and Daniel Henderson recently arXived a paper on a new kind of delayed acceptance.

*“With simplicity in mind, we focus on a k-nearest neighbour regression model as the cheap surrogate.”*

The central notion in the paper is to extrapolate from values of the likelihoods at a few points in the parameter space towards the whole space through a k-nearest neighbour estimate. While this solution is simple and relatively cheap to compute, it is unclear it is a good surrogate because it does not account for the structure of the model while depending on the choice of a distance. Recent works on Gaussian process approximations seem more relevant. See e.g. papers by Ed Meeds and Max Welling, or by Richard Wilkinson for ABC versions. Obviously, because this is a surrogate only for the first stage delayed acceptance (while the second stage is using the exact likelihood, as in our proposal), the approximation does not have to be super-tight. It should also favour the exploration of tails since (a) any proposal θ outside the current support of the chain is allocated a surrogate value that is the average of its k neighbours, hence larger than the true value in the tails, and (b) due to the delay a larger scale can be used in the random walk proposal. As the authors acknowledge, the knn method deteriorates quickly with the dimension. And computing the approximation grows with the number of MCMC iterations, given that the algorithm is adaptive and uses the exact likelihood values computed so far. Only for the first stage approximation, though, which explains “why” the delayed acceptance algorithm converges. I wondered for a short while whether this was enough to justify convergence, given that the original Metropolis-Hastings probability is just broken into two parts. Since the second stage compensates for the use of a surrogate on the first step, it should not matter in the end. However, the rejection of a proposal still depends on this approximation, i.e., differs from the original algorithm, and hence is turning the Markov chain into a non-Markovian process.

*“The analysis sheds light on how computationally cheap the deterministic approximation needs to be to make its use worthwhile and on the relative importance of it matching the `location’ and curvature of the target.”*

I had missed the “other” paper by some of the authors on the scaling of delayed acceptance, where they “assume that the error in the cheap deterministic approximation is a realisation of a random function” (p.3). In which they provide an optimal scaling result for high dimensions à la Roberts et al. (1997), namely a scale of 2.38 (times the target scale) in the random walk proposal. The paper however does not describe the cheap approximation to the target or pseudo-marginal version.

A large chunk of the paper is dedicated to the construction and improvement of the KD-tree used to find the k nearest neighbours. In O(d log(n)) time. Algorithm on which I have no specific comment. Except maybe that the construction of a KD-tree in accordance with a Mahalanobis distance discussed in Section 2.1 requires that the MCMC algorithm has properly converged, which is unrealistic. And also that the construction of a balanced tree seems to require heavy calibrations.

The paper is somewhat harder to read than need be (?) because the authors cumulate the idea of delayed acceptance based on this knn approximation with the technique of pseudo-marginal Metropolis-Hastings. While there is an added value in doing so it complexifies the exposition. And leads to ungainly acronyms like adaptive “da-PsMMH”, which simply are un-readable (!).

I would suggest some material to be published as supplementary material and the overall length of the paper to be reduced. For instance, Section 4.2 is not particularly conclusive. See, e.g., Theorem 2. Or the description of the simulated models in Section 5, which is sometimes redundant.

Filed under: Books, Statistics Tagged: acronym, adaptive MCMC methods, delayed acceptance, k-nearest neighbours, Markov chain, MCMC, Monte Carlo Statistical Methods, pseudo-marginal MCMC, simulation

### battle of Agincourt, 600th commemoration [in miniature]

**F**or the 600th commemoration of the battle of Agincourt, the National Museum of Arms and Armours [within the Tower of London] has commissioned a diorama of the battle that saw the French nobility decimated by English longbows, muddy fields, their heavy armours, and lust for glory…

The Royal Armouries have a blog describing the construction of this impressive diorama, which measures 4m x 2m and involves 4,400 figurines. And over 1,000 arrows stuck into the ground. Never forget the arrows!

At a personal level, besides a fascination for diorama [that started when I saw a D-Day diorama in one Norman museum as a kid] I had been told after The Accident that cutting some fingers was customary for captured bowmen, but this entry in Wikipedia writes it off as a myth.

Filed under: Books, Kids, pictures, Travel Tagged: Agincourt, armour, battle, bowmen, cavalry, diorama, England, figurines, France, Henry V, Hundred Years' War, National Museum of Arms and Armours, October 25, tin soldiers, Tower of London

### Le Monde puzzle [#934]

**A**nother Le Monde mathematical puzzle with no R code:

*Given a collection of 2€ coins and 5€ bills that sum up to 120€, find the number of 5€ bills such that the collection cannot be divided into 5 even parts.*

**I**ndeed, as soon as one starts formalising the problem, it falls apart: if there are a 5€ bills and b 2€ coins, we have 5a+2b=120, hence 2b=120-5a=5(24-a), meaning that b must be a multiple of 5, b=5b’ and a must be even, a=2a’, with b’=12-a’. Hence, 10 possible values for both pairs (a’,b’) and (a,b), since a>0 and b>0. If these 120 Euros can be evenly divided between 5 persons, each gets 24€. Now, 24€ can be decomposed in 5€ bills and 2€ coins in three ways:

24=2×2+4×5=7×2+2×5=12×2+0x5.

Each of the five persons using any of the 3 above decompositions means there exist integers α, β, and γ such that

α(2×2+4×5)+β(12×2)+γ(7×2+2×5)=(2α+12β+7γ)x2+(4α+2γ)x5=bx2+ax5

with α+β+γ=5; therefore a=4α+2γ and b=2α+12β+7γ, which implies 2α+γ=a’ and 2α+12β+87γ=5×12-5a’=2α+5×12-12α-12γ+7γ, or 5a’=10α+5γ. That is, 2α+γ=a’ again… If a’=11, there is no solution with α+γ≤5, and this is the only such case. For any other value of a’, there is a way to divide the 120€ in 5 even parts. As often, I wonder at the point of the puzzle if this is the answer or at its phrasing if I have misunderstood the question.

Just to check the above by R means, I still wrote a short R code

for (a in 1:11){ # find integer solutions to 2x+y=a sum=0;z=-1 while ((z<a)&(z<6)&(sum<2)){ z=z+1;x=trunc((a-z)/2);y=5-x-z sum=(2*a==4*x+2*z)+(5*(11-a)==x+11*y+6*z)} print(c(2*a,5*(11-a),x,y,z)) }which returned

[1] 2 50 0 4 1 [1] 4 45 1 4 0 [1] 6 40 1 3 1 [1] 8 35 2 3 0 [1] 10 30 2 2 1 [1] 12 25 3 2 0 [1] 14 20 3 1 1 [1] 16 15 4 1 0 [1] 18 10 4 0 1 [1] 20 5 5 0 0 [1] 22 0 5 -1 1meaning that a’=11 does not produce a viable solution.

Filed under: Books, Kids, pictures, Statistics, Travel, University life Tagged: arithmetics, bills, Le Monde, mathematical puzzle, R

### Trip to Louvain (and back)

**A**part from the minor initial inconvenience that I missed my train to Brussels thanks to the SNCF train company dysfunctional automata [but managed to switch to one half-an-hour later], my Belgian trip to Louvain-la-Neuve was quite enjoyable! I met with several local faculty [UCL] members I had not seen for several years, I gave my talk for the World Statistics Day in front of a large audience, maybe not the most appropriate talk for that day since it was somewhat skeptical about the nature of statistical tests, I got sharp questions, comments, and suggestions on the mixture approach to testing [incl. a challenging one about the Bernoulli B(p) case], I had a superb and animated and friendly dinner in a local restaurant—where everyone kindly spoke French although I was the only native French speaker—, I met the next morning with two PhD students from KU Leuven (the “other” part of the former Leuven university, albeit in the Flemmish side of the border) about functional ABC and generalised Jeffreys priors, I had a few more interesting discussions, and I managed to grab a few bags of Belgian waffles in Brussels before heading home! (In case you wonder from the above pixture, the crowds in the pedestrian streets of Louvain-la-Neuve were not connected to my visit!, but to a student festival centred at beer a 24 hour bike relay that attracted around 50,000 students, for less than a hundred bikes!)

Filed under: Kids, pictures, Running, Statistics, Travel, University life, Wines Tagged: Bayesian tests of hypotheses, Belgian beer, Belgique, bike, bracelet, festival, Louvain-la-Neuve, UCL, waffles, Wallonie, World Statistics Day

### model selection and multiple testing

**R**itabrata Dutta, Malgorzata Bogdan and Jayanta Ghosh recently arXived a survey paper on model selection and multiple testing. Which provides a good opportunity to reflect upon traditional Bayesian approaches to model choice. And potential alternatives. On my way back from Madrid, where I got a bit distracted when flying over the South-West French coast, from Biarritz to Bordeaux. Spotting the lake of Hourtain, where I spent my military training month, 29 years ago!

*“On the basis of comparison of AIC and BIC, we suggest tentatively that model selection rules should be used for the purpose for which they were introduced. If they are used for other problems, a fresh justification is desirable. In one case, justification may take the form of a consistency theorem, in the other some sort of oracle inequality. Both may be hard to prove. Then one should have substantial numerical assessment over many different examples.”*

The authors quickly replace the Bayes factor with BIC, because it is typically consistent. In the comparison between AIC and BIC they mention the connundrum of defining a prior on a nested model from the prior on the nesting model, a problem that has not been properly solved in my opinion. The above quote with its call to a large simulation study reminded me of the paper by Arnold & Loeppky about running such studies through ecdfs. That I did not see as solving the issue. The authors also discuss DIC and Lasso, without making much of a connection between those, or with the above. And then reach the parametric empirical Bayes approach to model selection exemplified by Ed George’s and Don Foster’s 2000 paper. Which achieves asymptotic optimality for posterior prediction loss (p.9). And which unifies a wide range of model selection approaches.

A second part of the survey considers the large p setting, where BIC is not a good approximation to the Bayes factor (when testing whether or not all mean entries are zero). And recalls that there are priors ensuring consistency for the Bayes factor in this very [restrictive] case. Then, in Section 4, the authors move to what they call “cross-validatory Bayes factors”, also known as partial Bayes factors and pseudo-Bayes factors, where the data is split to (a) make the improper prior proper and (b) run the comparison or test on the remaining data. They also show the surprising result that, provided the fraction of the data used to proper-ise the prior does not converge to one, the X validated Bayes factor remains consistent [for the special case above]. The last part of the paper concentrates on multiple testing but is more tentative and conjecturing about convergence results, centring on the differences between full Bayes and empirical Bayes. Then the plane landed in Paris and I stopped my reading, not feeling differently about the topic than when the plane started from Madrid.

Filed under: Books, pictures, Statistics, Travel, University life Tagged: AIC, Bayes factor, Bayesian hypothesis testing, Bayesian model choice, Biarritz, BIC, Bordeaux, consistency, DIC, empirical Bayes, Hourtain, Madrid, pseudo-priors, The Bayesian Choice

### AISTATS 2016 [post-submissions]

**N**ow that the deadline for AISTATS 2016 submissions is past, I can gladly report that we got the amazing number of 559 submissions, which is much more than what was submitted to the previous AISTATS conferences. To the point it made us fear for a little while [but not any longer!] that the conference room was not large enough. And hope that we had to install video connections in the hotel bar!

Which also means handling about the same amount of papers as a year of JRSS B submissions within a single month!, the way those submissions are handled for the AISTATS 2016 conference proceedings. The process is indeed [as in other machine learning conferences] to allocate papers to associate editors [or meta-reviewers or area chairs] with a bunch of papers and then have those AEs allocate papers to reviewers, all this within a few days, as the reviews have to be returned to authors within a month, for November 16 to be precise. This sounds like a daunting task but it proceeded rather smoothly due to a high degree of automation (this is machine-learning, after all!) in processing those papers, thanks to (a) the immediate response to the large majority of AEs and reviewers involved, who bid on the papers that were of most interest to them, and (b) a computer program called the Toronto Paper Matching System, developed by Laurent Charlin and Richard Zemel. Which tremendously helps with managing about everything! Even when accounting for the more formatted entries in such proceedings (with an 8 page limit) and the call to the conference participants for reviewing other papers, I remain amazed at the resulting difference in the time scales for handling papers in the fields of statistics and machine-learning. (There was a short lived attempt to replicate this type of processing for the Annals of Statistics, if I remember well.)

Filed under: Books, pictures, Statistics, Travel, University life Tagged: AISTATS 2014, AISTATS 2016, Andalucía, Artificial Intelligence and Statistics, Cadiz, conferences, editor, machine learning, proceedings, refereeing, scientific journals, Spain

### Entity Resolution with Empirically Motivated Priors

**Rebecca C. Steorts**.

**Source: **Bayesian Analysis, Volume 10, Number 4, 849--875.

**Abstract:**

Databases often contain corrupted, degraded, and noisy data with duplicate entries across and within each database. Such problems arise in citations, medical databases, genetics, human rights databases, and a variety of other applied settings. The target of statistical inference can be viewed as an unsupervised problem of determining the edges of a bipartite graph that links the observed records to unobserved latent entities. Bayesian approaches provide attractive benefits, naturally providing uncertainty quantification via posterior probabilities. We propose a novel record linkage approach based on empirical Bayesian principles. Specifically, the empirical Bayesian-type step consists of taking the empirical distribution function of the data as the prior for the latent entities. This approach improves on the earlier HB approach not only by avoiding the prior specification problem but also by allowing both categorical and string-valued variables. Our extension to string-valued variables also involves the proposal of a new probabilistic mechanism by which observed record values for string fields can deviate from the values of their associated latent entities. Categorical fields that deviate from their corresponding true value are simply drawn from the empirical distribution function. We apply our proposed methodology to a simulated data set of German names and an Italian household survey on income and wealth, showing our method performs favorably compared to several standard methods in the literature. We also consider the robustness of our methods to changes in the hyper-parameters.

### Posterior Belief Assessment: Extracting Meaningful Subjective Judgements from Bayesian Analyses with Complex Statistical Models

**Daniel Williamson**,

**Michael Goldstein**.

**Source: **Bayesian Analysis, Volume 10, Number 4, 877--908.

**Abstract:**

In this paper, we are concerned with attributing meaning to the results of a Bayesian analysis for a problem which is sufficiently complex that we are unable to assert a precise correspondence between the expert probabilistic judgements of the analyst and the particular forms chosen for the prior specification and the likelihood for the analysis. In order to do this, we propose performing a finite collection of additional Bayesian analyses under alternative collections of prior and likelihood modelling judgements that we may also view as representative of our prior knowledge and the problem structure, and use these to compute posterior belief assessments for key quantities of interest. We show that these assessments are closer to our true underlying beliefs than the original Bayesian analysis and use the temporal sure preference principle to establish a probabilistic relationship between our true posterior judgements, our posterior belief assessment and our original Bayesian analysis to make this precise. We exploit second order exchangeability in order to generalise our approach to situations where there are infinitely many alternative Bayesian analyses we might consider as informative for our true judgements so that the method remains tractable even in these cases. We argue that posterior belief assessment is a tractable and powerful alternative to robust Bayesian analysis. We describe a methodology for computing posterior belief assessments in even the most complex of statistical models and illustrate with an example of calibrating an expensive ocean model in order to quantify uncertainty about global mean temperature in the real ocean.

### Updated: Bayesian Analysis, Volume 10, Number 4 (2015)

Contents:

**R. V. Ramamoorthi**, **Karthik Sriram**, **Ryan Martin**. On Posterior Concentration in Misspecified Models. 759--789.

**Roberto Casarin**, **Fabrizio Leisen**, **German Molina**, **Enrique ter Horst**. A Bayesian Beta Markov Random Field Calibration of the Term Structure of Implied Risk Neutral Densities. 791--819.

**Maria DeYoreo**, **Athanasios Kottas**. A Fully Nonparametric Modeling Approach to Binary Regression. 821--847.

**Rebecca C. Steorts**. Entity Resolution with Empirically Motivated Priors. 849--875.

**Daniel Williamson**, **Michael Goldstein**. Posterior Belief Assessment: Extracting Meaningful Subjective Judgements from Bayesian Analyses with Complex Statistical Models. 877--908.

### On Posterior Concentration in Misspecified Models

**R. V. Ramamoorthi**,

**Karthik Sriram**,

**Ryan Martin**.

**Source: **Bayesian Analysis, Volume 10, Number 4, 759--789.

**Abstract:**

We investigate the asymptotic behavior of Bayesian posterior distributions under independent and identically distributed ( $i.i.d.$ ) misspecified models. More specifically, we study the concentration of the posterior distribution on neighborhoods of $f^{\star}$ , the density that is closest in the Kullback–Leibler sense to the true model $f_{0}$ . We note, through examples, the need for assumptions beyond the usual Kullback–Leibler support assumption. We then investigate consistency with respect to a general metric under three assumptions, each based on a notion of divergence measure, and then apply these to a weighted $L_{1}$ -metric in convex models and non-convex models.
Although a few results on this topic are available, we believe that these are somewhat inaccessible due, in part, to the technicalities and the subtle differences compared to the more familiar well-specified model case. One of our goals is to make some of the available results, especially that of Kleijn and van der Vaart (2006), more accessible. Unlike their paper, our approach does not require construction of test sequences. We also discuss a preliminary extension of the $i.i.d.$ results to the independent but not identically distributed ( $i.n.i.d.$ ) case.

### A Bayesian Beta Markov Random Field Calibration of the Term Structure of Implied Risk Neutral Densities

**Roberto Casarin**,

**Fabrizio Leisen**,

**German Molina**,

**Enrique ter Horst**.

**Source: **Bayesian Analysis, Volume 10, Number 4, 791--819.

**Abstract:**

We build on the derivative pricing calibration literature, and propose a more general calibration model for implied risk neutral densities. Our model allows for the joint calibration of a set of densities at different maturities and dates through a Bayesian dynamic Beta Markov Random Field. Our approach allows for possible time dependence between densities with the same maturity, and for dependence across maturities at the same point in time. This approach to the risk neutral density calibration problem encompasses model flexibility, parameter parsimony, and, more importantly, information pooling across densities. This proposed methodology can be naturally extended to other areas where multidimensional calibration is needed.

### A Fully Nonparametric Modeling Approach to Binary Regression

**Maria DeYoreo**,

**Athanasios Kottas**.

**Source: **Bayesian Analysis, Volume 10, Number 4, 821--847.

**Abstract:**

We propose a general nonparametric Bayesian framework for binary regression, which is built from modeling for the joint response–covariate distribution. The observed binary responses are assumed to arise from underlying continuous random variables through discretization, and we model the joint distribution of these latent responses and the covariates using a Dirichlet process mixture of multivariate normals. We show that the kernel of the induced mixture model for the observed data is identifiable upon a restriction on the latent variables. To allow for appropriate dependence structure while facilitating identifiability, we use a square-root-free Cholesky decomposition of the covariance matrix in the normal mixture kernel. In addition to allowing for the necessary restriction, this modeling strategy provides substantial simplifications in implementation of Markov chain Monte Carlo posterior simulation. We present two data examples taken from areas for which the methodology is especially well suited. In particular, the first example involves estimation of relationships between environmental variables, and the second develops inference for natural selection surfaces in evolutionary biology. Finally, we discuss extensions to regression settings with ordinal responses.

### Bayesian Analysis, Volume 10, Number 4 (2015)

Contents:

**R. V. Ramamoorthi**, **Karthik Sriram**, **Ryan Martin**. On Posterior Concentration in Misspecified Models. 759--789.

**Roberto Casarin**, **Fabrizio Leisen**, **German Molina**, **Enrique ter Horst**. A Bayesian Beta Markov Random Field Calibration of the Term Structure of Implied Risk Neutral Densities. 791--819.

**Maria DeYoreo**, **Athanasios Kottas**. A Fully Nonparametric Modeling Approach to Binary Regression. 821--847.

### Updated: Bayesian Analysis, Volume 10, Number 3 (2015)

Contents:

**Sergio Venturini**, **Francesca Dominici**, **Giovanni Parmigiani**. Generalized Quantile Treatment Effect: A Flexible Bayesian Approach Using Quantile Ratio Smoothing. 523--552.

**Mauro Bernardi**, **Ghislaine Gayraud**, **Lea Petrella**. Bayesian Tail Risk Interdependence Using Quantile Regression. 553--603.

**Yajuan Si**, **Natesh S. Pillai**, **Andrew Gelman**. Bayesian Nonparametric Weighted Sampling Inference. 605--625.

**Douglas K. Sparks**, **Kshitij Khare**, **Malay Ghosh**. Necessary and Sufficient Conditions for High-Dimensional Posterior Consistency under $g$ -Priors. 627--664.

**Maxim Panov**, **Vladimir Spokoiny**. Finite Sample Bernstein – von Mises Theorem for Semiparametric Problems. 665--710.

**Gustavo da Silva Ferreira**, **Dani Gamerman**. Optimal Design in Geostatistics under Preferential Sampling. 711--735.

**Michael Chipeta**, **Peter J. Diggle**. Comment on Article by Ferreira and Gamerman. 737--739.

**Noel Cressie**, **Raymond L. Chambers**. Comment on Article by Ferreira and Gamerman. 741--748.

**James V. Zidek**. Comment on Article by Ferreira and Gamerman. 749--752.

**Gustavo da Silva Ferreira**, **Dani Gamerman**. Rejoinder. 753--758.