## Bayesian News Feeds

### amazonish thanks (& repeated warning)

**A**s in previous years, at about this time, I want to (re)warn unaware ‘Og readers that all links to Amazon.com and more rarely to Amazon.fr found on this blog are actually susceptible to earn me an advertising percentage if a purchase is made by the reader* in the 24 hours following the entry on Amazon through this link*, thanks to the “*Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to amazon.com/fr*“. Unlike last year, I did not benefit as much from the new edition of Andrew’s book, and the link he copied from my blog entry… Here are some of the most Og-unrelated purchases:

- Mr. Beer Deluxe Beer Bottling System
- Kyjen 2518 Dog Life Jacket
- Fisher-Price Learn-to-Flush Potty
- Way Huge Green Rhino
- WWII Helmets and Headgear

Once again, books I reviewed, positively or negatively, were among the top purchases… Like a dozen Monte Carlo simulation and resampling methods for social science , a few copies of Naked Statistics. And again a few of The Cartoon Introduction to Statistics. (Despite a most critical review.) Thanks to all of you using those links and feeding further my book addiction, with the drawback of inducing even more fantasy book reviews.

Filed under: Books, Kids, R, Statistics Tagged: Amazon, amazon associates, book reviews, dog life jacket, Monte Carlo Statistical Methods, Og

### the demise of the Bayes factor

**W**ith Kaniav Kamary, Kerrie Mengersen, and Judith Rousseau, we have just arXived (and submitted) a paper entitled “Testing hypotheses via a mixture model”. (We actually presented some earlier version of this work in Cancũn, Vienna, and Gainesville, so you may have heard of it already.) The notion we advocate in this paper is to replace the posterior probability of a model or an hypothesis with the posterior distribution of the weights of a mixture of the models under comparison. That is, given two models under comparison,

we propose to estimate the (artificial) mixture model

and in particular derive the posterior distribution of α. One may object that the mixture model is neither of the two models under comparison but this is the case at the boundary, i.e., when α=0,1. Thus, if we use prior distributions on α that favour the neighbourhoods of 0 and 1, we should be able to see the posterior concentrate near 0 or 1, depending on which model is true. And indeed this is the case: for any given Beta prior on α, we observe a higher and higher concentration at the right boundary as the sample size increases. And establish a convergence result to this effect. Furthermore, the mixture approach offers numerous advantages, among which *[verbatim from the paper]*:

- relying on a Bayesian estimator of the weight α rather than on the posterior probability of the corresponding model does remove the need of overwhelmingly artificial prior probabilities on model indices;
- the interpretation of this estimator is at least as natural as handling the posterior probability, while avoiding the caricaturesque zero-one loss setting. The quantity α and its posterior distribution provide a measure of proximity to both models for the data at hand, while being also interpretable as a propensity of the data to stand with (or to stem from) one of the two models. This representation further allows for alternative perspectives on testing and model choices, through the notions of predictive tools cross-validation, and information indices like WAIC;
- the highly problematic computation of the marginal likelihoods is bypassed, standard algorithms being available for Bayesian mixture estimation;
- the extension to a finite collection of models to be compared is straightforward, as this simply involves a larger number of components. This approach further allows to consider all models at once rather than engaging in pairwise costly comparisons and thus to eliminate the least likely models by simulation, those being not explored by the corresponding algorithm;
- the (simultaneously conceptual and computational) difficulty of “label switching” that plagues both Bayesian estimation and Bayesian computation for most mixture models completely vanishes in this particular context, since components are no longer exchangeable. In particular, we compute neither a Bayes factor nor a posterior probability related with the substitute mixture model and we hence avoid the difficulty of recovering the modes of the posterior distribution. Our perspective is solely centred on estimating the parameters of a mixture model where both components are always identifiable;
- the posterior distribution of α evaluates more thoroughly the strength of the support for a given model than the single figure outcome of a Bayes factor or of a posterior probability. The variability of the posterior distribution on α allows for a more thorough assessment of the strength of the support of one model against the other;
- an additional feature missing from traditional Bayesian answers is that a mixture model also acknowledges the possibility that, for a finite dataset,
*both*models or*none*could be acceptable. - while standard (proper and informative) prior modelling can be painlessly reproduced in this novel setting, non-informative (improper)

priors now are manageable therein, provided both models under comparison are first reparametrised towards common-meaning and shared parameters, as for instance with location and scale parameters. In the special case when all parameters can be made common to both models [While this may sound like an extremely restrictive requirement in a traditional mixture model, let us stress here that the presence of common parameters becomes quite natural within a testing setting. To wit, when comparing two different models for the*same*data, moments are defined in terms of the observed data and hence*should*be the*same*for both models. Reparametrising the models in terms of those common meaning moments does lead to a mixture model with some and maybe*all*common parameters. We thus advise the use of a common parametrisation, whenever possible.] the mixture model reads asFor instance, if θ is a location parameter, a flat prior can be used with no foundational difficulty, in opposition to the testing case;

- continuing from the previous argument, using the
*same*parameters or some*identical*parameters on both components is an essential feature of this reformulation of Bayesian testing, as it highlights the fact that the opposition between the two components of the mixture is not an issue of enjoying different parameters, but quite the opposite. As further stressed below, this or even*those*common parameter(s) is (are) nuisance parameters that need be integrated out (as they also are in the traditional Bayesian approach through the computation of the marginal likelihoods); - the choice of the prior model probabilities is rarely discussed in a classical Bayesian approach, even though those probabilities linearly impact the posterior probabilities and can be argued to promote the alternative of using the Bayes factor instead. In the mixture estimation setting, prior modelling only involves selecting a prior on α, for instance a Beta B(a,a) distribution, with a wide range of acceptable values for the hyperparameter a. While the value of a impacts the posterior distribution of α, it can be argued that (a) it nonetheless leads to an accumulation of the mass near 1 or 0, i.e., to favour the most favourable or the true model over the other one, and (b) a sensitivity analysis on the impact of a is straightforward to carry on;
- in most settings, this approach can furthermore be easily calibrated by a parametric bootstrap experiment providing a posterior distribution of α under each of the models under comparison. The prior predictive error can therefore be directly estimated and can drive the choice of the hyperparameter a, if need be.

Filed under: Books, Kids, Statistics, Travel, University life Tagged: Bayes factor, Bayesian hypothesis testing, component of a mixture, consistency, hyperparameter, model posterior probabilities, posterior, prior, testing as mixture estimation

### Statistics slides (5)

**H**ere is the fifth and last set of slides for my third year statistics course, trying to introduce Bayesian statistics in the most natural way and hence starting with… Rasmus’ socks and ABC!!! This is an interesting experiment as I have no idea how my students will react. Either they will see the point besides the anecdotal story or they’ll miss it (being quite unhappy so far about the lack of mathematical rigour in my course and exercises…). We only have two weeks left so I am afraid the concept will not have time to seep through!

Filed under: Books, Kids, Statistics, University life Tagged: Bayesian statistics, Don Rubin, HPD region, map, Paris, Université Paris Dauphine

### Whispers underground [book review]

*“Dr. Walid said that normal human variations were wide enough that you’d need samples of hundreds of subjects to test that. Thousands if you wanted a statistically significant answer.*

* Low sample size—one of the reasons why magic and science are hard to reconcile.”*

**T**his is the third volume in the Rivers of London series, brought back from Gainesville, and possibly the least successful (in my opinion). It indeed takes place underground and not only in the Underground and the underground sewers of London. Which is this literary trick that always irks me in fantasy novels, namely the sudden appearance of massive underground complex with unsuspected societies that are large and evolved enough to reach the Industrial Age.* (Sorry if this is too much of a spoiler!)*

*“It was the various probability calculations that stuffed me—they always do. I’d have been a bad scientist.”*

Not that everything is bad in this novel: I still like the massive infodump about London, the style and humour, the return of PC Lesley trying to get over the (literal) loss of her face, and the appearance of new characters. But the story itself, revolving about a murder investigation, is rather shallow and the (compulsory?) English policeman versus American cop competition is too contrived to be funny. Most of the major plot is hidden from this volume, unless there are clues I missed. (For instance, one death from a previous volume which seemed to get ignored at that time is finally explained here.) Definitely not the book to read on its own, as it still relates and borrow much from the previous volumes, but presumably one to read nonetheless as the next instalment, *Broken homes*.

Filed under: Books, pictures, Travel Tagged: Gainesville, Isaac Newton, London, PC Peter Grant, Thames, Underground

### Bayesian Analysis, Volume 9, Number 4 (2014)

Contents:

**Jesse Windle**, **Carlos M. Carvalho**. A Tractable State-Space Model for Symmetric Positive-Definite Matrices. 759--792.

**Roberto Casarin**. Comment on Article by Windle and Carvalho. 793--804.

**Catherine Scipione Forbes**. Comment on Article by Windle and Carvalho. 805--808.

**Enrique ter Horst**, **German Molina**. Comment on Article by Windle and Carvalho. 809--818.

**Jesse Windle**, **Carlos M. Carvalho**. Rejoinder. 819--822.

**Asael Fabian Martínez**, **Ramsés H. Mena**. On a Nonparametric Change Point Detection Model in Markovian Regimes. 823--858.

**Eduard Belitser**, **Paulo Serra**. Adaptive Priors Based on Splines with Random Knots. 859--882.

**Henrik Nyman**, **Johan Pensar**, **Timo Koski**, **Jukka Corander**. Stratified Graphical Models - Context-Specific Independence in Graphical Models. 883--908.

**David Shalloway**. The Evidentiary Credible Region. 909--922.

**Arkady Shemyakin**. Hellinger Distance and Non-informative Priors. 923--938.

**Isabelle Smith**, **André Ferrari**. Equivalence between the Posterior Distribution of the Likelihood Ratio and a p-value in an Invariant Frame. 939--962.

**Linda S. L. Tan**, **David J. Nott**. A Stochastic Variational Framework for Fitting and Diagnosing Generalized Linear Mixed Models. 963--1004.

### A Tractable State-Space Model for Symmetric Positive-Definite Matrices

**Jesse Windle**,

**Carlos M. Carvalho**.

**Source: **Bayesian Analysis, Volume 9, Number 4, 759--792.

**Abstract:**

The Bayesian analysis of a state-space model includes computing the posterior distribution of the system’s parameters as well as its latent states. When the latent states wander around $\mathbb{R}^{n}$ there are several well-known modeling components and computational tools that may be profitably combined to achieve this task. When the latent states are constrained to a strict subset of $\mathbb{R}^{n}$ these models and tools are either impaired or break down completely. State-space models whose latent states are covariance matrices arise in finance and exemplify the challenge of devising tractable models in the constrained setting. To that end, we present a state-space model whose observations and latent states take values on the manifold of symmetric positive-definite matrices and for which one may easily compute the posterior distribution of the latent states and the system’s parameters as well as filtered distributions and one-step ahead predictions. Employing the model within the context of finance, we show how one can use realized covariance matrices as data to predict latent time-varying covariance matrices. This approach out-performs factor stochastic volatility.

### Comment on Article by Windle and Carvalho

**Roberto Casarin**.

**Source: **Bayesian Analysis, Volume 9, Number 4, 793--804.

**Abstract:**

This article discusses Windle and Carvalho’s (2014) state-space model for observations and latent variables in the space of positive symmetric matrices. The present discussion focuses on the model specification and on the contribution to the positive-value time series literature. I apply the proposed model to financial data with a view to shedding light on some modeling issues.

### Comment on Article by Windle and Carvalho

**Catherine Scipione Forbes**.

**Source: **Bayesian Analysis, Volume 9, Number 4, 805--808.

### Comment on Article by Windle and Carvalho

**Enrique ter Horst**,

**German Molina**.

**Source: **Bayesian Analysis, Volume 9, Number 4, 809--818.

**Abstract:**

The article by Windle and Carvalho introduces a fast update procedure for covariance matrices through the introduction of higher frequency sources of information for the underlying process, demonstrated with a financial application. This discussion focuses on outlining the assumptions and constraints around their use in financial applications, as well as an elicitation of some key choices made for comparison with traditional benchmarks, that may ultimately affect the results.

### Rejoinder

**Jesse Windle**,

**Carlos M. Carvalho**.

**Source: **Bayesian Analysis, Volume 9, Number 4, 819--822.

### On a Nonparametric Change Point Detection Model in Markovian Regimes

**Asael Fabian Martínez**,

**Ramsés H. Mena**.

**Source: **Bayesian Analysis, Volume 9, Number 4, 823--858.

**Abstract:**

Change point detection models aim to determine the most probable grouping for a given sample indexed on an ordered set. For this purpose, we propose a methodology based on exchangeable partition probability functions, specifically on Pitman’s sampling formula. Emphasis will be given to the Markovian case, in particular for discretely observed Ornstein-Uhlenbeck diffusion processes. Some properties of the resulting model are explained and posterior results are obtained via a novel Markov chain Monte Carlo algorithm.

### Adaptive Priors Based on Splines with Random Knots

**Eduard Belitser**,

**Paulo Serra**.

**Source: **Bayesian Analysis, Volume 9, Number 4, 859--882.

**Abstract:**

Splines are useful building blocks when constructing priors on nonparametric models indexed by functions. Recently it has been established in the literature that hierarchical adaptive priors based on splines with a random number of equally spaced knots and random coefficients in the B-spline basis corresponding to those knots lead, under some conditions, to optimal posterior contraction rates, over certain smoothness functional classes. In this paper we extend these results for when the location of the knots is also endowed with a prior. This has already been a common practice in Markov chain Monte Carlo applications, but a theoretical basis in terms of adaptive contraction rates was missing. Under some mild assumptions, we establish a result that provides sufficient conditions for adaptive contraction rates in a range of models, over certain functional classes of smoothness up to the order of the splines that are used. We also present some numerical results illustrating how such a prior adapts to inhomogeneous variability (smoothness) of the function in the context of nonparametric regression.

### Stratified Graphical Models - Context-Specific Independence in Graphical Models

**Henrik Nyman**,

**Johan Pensar**,

**Timo Koski**,

**Jukka Corander**.

**Source: **Bayesian Analysis, Volume 9, Number 4, 883--908.

**Abstract:**

Theory of graphical models has matured over more than three decades to provide the backbone for several classes of models that are used in a myriad of applications such as genetic mapping of diseases, credit risk evaluation, reliability and computer security. Despite their generic applicability and wide adoption, the constraints imposed by undirected graphical models and Bayesian networks have also been recognized to be unnecessarily stringent under certain circumstances. This observation has led to the proposal of several generalizations that aim at more relaxed constraints by which the models can impose local or context-specific dependence structures. Here we consider an additional class of such models, termed stratified graphical models. We develop a method for Bayesian learning of these models by deriving an analytical expression for the marginal likelihood of data under a specific subclass of decomposable stratified models. A non-reversible Markov chain Monte Carlo approach is further used to identify models that are highly supported by the posterior distribution over the model space. Our method is illustrated and compared with ordinary graphical models through application to several real and synthetic datasets.

### The Evidentiary Credible Region

**David Shalloway**.

**Source: **Bayesian Analysis, Volume 9, Number 4, 909--922.

**Abstract:**

Many disparate definitions of Bayesian credible intervals and regions are in use, which can lead to ambiguous presentation of results. It is particularly unsatisfactory when intervals are specified that do not match the one-sided character of the evidence. We suggest that a sensible resolution is to use the parameterization-independent region that maximizes the information gain between the initial prior and posterior distributions, as assessed by their Kullback-Leibler divergence, subject to the constraint on included posterior probability. This turns out to be equivalent to the relative surprise region previously defined by Evans (1997), and thus provides information theoretic support for its use. We also show that this region is the constrained optimizer over the posterior measure of any strictly monotonic function of the likelihood, which explains its many optimal properties, and that it is guaranteed to be consistent with the sidedness of the evidence. Because all of its equivalent derivations depend on the evidence as well as on the posterior distribution, we suggest that it be called the evidentiary credible region.

### Hellinger Distance and Non-informative Priors

**Arkady Shemyakin**.

**Source: **Bayesian Analysis, Volume 9, Number 4, 923--938.

**Abstract:**

This paper introduces an extension of the Jeffreys’ rule to the construction of objective priors for non-regular parametric families. A new class of priors based on Hellinger information is introduced as Hellinger priors. The main results establish the relationship of Hellinger priors to the Jeffreys’ rule priors in the regular case, and to the reference and probability matching priors for the non-regular class introduced by Ghosal and Samanta. These priors are also studied for some non-regular examples outside of this class. Their behavior proves to be similar to that of the reference priors considered by Berger, Bernardo, and Sun, however some differences are observed. For the multi-parameter case, a combination of Hellinger priors and reference priors is suggested and some examples are considered.

### Equivalence between the Posterior Distribution of the Likelihood Ratio and a p-value in an Invariant Frame

**Isabelle Smith**,

**André Ferrari**.

**Source: **Bayesian Analysis, Volume 9, Number 4, 939--962.

**Abstract:**

The Posterior distribution of the Likelihood Ratio (PLR) is proposed by Dempster in 1973 for significance testing in the simple vs. composite hypothesis case. In this hypothesis test case, classical frequentist and Bayesian hypothesis tests are irreconcilable, as emphasized by Lindley’s paradox, Berger & Selke in 1987 and many others. However, Dempster shows that the PLR (with inner threshold 1) is equal to the frequentist p-value in the simple Gaussian case. In 1997, Aitkin extends this result by adding a nuisance parameter and showing its asymptotic validity under more general distributions. Here we extend the reconciliation between the PLR and a frequentist p-value for a finite sample, through a framework analogous to the Stein’s theorem frame in which a credible (Bayesian) domain is equal to a confidence (frequentist) domain.

### A Stochastic Variational Framework for Fitting and Diagnosing Generalized Linear Mixed Models

**Linda S. L. Tan**,

**David J. Nott**.

**Source: **Bayesian Analysis, Volume 9, Number 4, 963--1004.

**Abstract:**

In stochastic variational inference, the variational Bayes objective function is optimized using stochastic gradient approximation, where gradients computed on small random subsets of data are used to approximate the true gradient over the whole data set. This enables complex models to be fit to large data sets as data can be processed in mini-batches. In this article, we extend stochastic variational inference for conjugate-exponential models to nonconjugate models and present a stochastic nonconjugate variational message passing algorithm for fitting generalized linear mixed models that is scalable to large data sets. In addition, we show that diagnostics for prior-likelihood conflict, which are useful for Bayesian model criticism, can be obtained from nonconjugate variational message passing automatically, as an alternative to simulation-based Markov chain Monte Carlo methods. Finally, we demonstrate that for moderate-sized data sets, convergence can be accelerated by using the stochastic version of nonconjugate variational message passing in the initial stage of optimization before switching to the standard version.

### Bayesian Analysis, Volume 9, Number 4 (2014)

Contents:

**Jesse Windle**, **Carlos M. Carvalho**. A Tractable State-Space Model for Symmetric Positive-Definite Matrices. 759--792.

**Roberto Casarin**. Comment on Article by Windle and Carvalho. 793--804.

**Catherine Scipione Forbes**. Comment on Article by Windle and Carvalho. 805--808.

**Enrique ter Horst**, **German Molina**. Comment on Article by Windle and Carvalho. 809--818.

**Jesse Windle**, **Carlos M. Carvalho**. Rejoinder. 819--822.

**Asael Fabian Martínez**, **Ramsés H. Mena**. On a Nonparametric Change Point Detection Model in Markovian Regimes. 823--858.

**Eduard Belitser**, **Paulo Serra**. Adaptive Priors Based on Splines with Random Knots. 859--882.

**Henrik Nyman**, **Johan Pensar**, **Timo Koski**, **Jukka Corander**. Stratified Graphical Models - Context-Specific Independence in Graphical Models. 883--908.

**David Shalloway**. The Evidentiary Credible Region. 909--922.

**Arkady Shemyakin**. Hellinger Distance and Non-informative Priors. 923--938.

**Isabelle Smith**, **André Ferrari**. Equivalence between the Posterior Distribution of the Likelihood Ratio and a p-value in an Invariant Frame. 939--962.

**Linda S. L. Tan**, **David J. Nott**. A Stochastic Variational Framework for Fitting and Diagnosing Generalized Linear Mixed Models. 963--1004.