## Bayesian News Feeds

### estimating normalising constants [mea culpa?!]

*“The basic idea is to estimate the parameters by learning to discriminate **between the data x and some artificially generated noise y.”*

**I**n the sequel of this popular earlier post of mine on [not] estimating normalising constants, Simon Barthelmé and Nicolas Chopin pointed me to recent papers by Michael Gutmann and Aapo Hyvaärinen on this topic, one published in the proceedings of AISTATS 2010 in Sardinia and one from the proceedings of the 2013 Workshop on Information Theoretic Methods in Science and Engineering (WITMSE2013), in Tokyo. Which led me to reconsider my perspective on this issue…

**J**ust like Larry, Gutmann and Hyvaärinen consider the normalising constant associated with an unnormalised density,

as *an extra parameter*. They then add to the actual sample from the unnormalised density an artificial sample from a fixed distribution g with identical size and eventually proceed to run a logistic regression on the model index (p *versus* g) based on those merged datasets. A logistic regression parameterised by the difference of the log-densities:

With the actual sample corresponding to the first modality and the artificial sample to the second modality. While the resulting estimator is different, this approach reminds me of the proposal we made in our nested sampling paper of 2009 with Nicolas, esp. Section 6.3 where we also introduce an artificial mixture to estimate the normalising constant (and obtain an alternative version of bridge sampling). The difference is that Gutmann and Hyvärinen estimate both Z and α by logistic regression. And without imposing the integration constraint that would turn Z into a superfluous “parameter”.

**N**ow, if we return to the original debate, does this new estimation approach close it? And if so, is it to my defeat (hence the title)?! Obviously, Gutmann and Hyvärinen use both a statistical technique and a statistical model to estimate the constant Z(α). They produce an extra artificial sample from g but exploit the current sample from p and no other. The estimator of the normalising constant is converging with the sample size. However, I do remain puzzled by the addition of the normalising constant to the parameter vector. The data comes from a probability distribution and hence the normalising constraint holds. Relaxing the constraint leads to a minimisation framework that can be interpreted as either statistics or numerics. Which keeps open my original questioning of which information about the constant Z(α) is contained in the sample per se… (But not questioning the potential in using this method in providing a constant estimate.)

Filed under: Statistics

### Bayesian Analysis, Volume 9, Number 2 (2014)

Contents:

**Zhihua Zhang**, **Dakan Wang**, **Guang Dai**, **Michael I. Jordan**. Matrix-Variate Dirichlet Process Priors with Applications. 259--286.

**Nammam Ali Azadi**, **Paul Fearnhead**, **Gareth Ridall**, **Joleen H. Blok**. Bayesian Sequential Experimental Design for Binary Response Data with Application to Electromyographic Experiments. 287--306.

**Juhee Lee**, **Steven N. MacEachern**, **Yiling Lu**, **Gordon B. Mills**. Local-Mass Preserving Prior Distributions for Nonparametric Bayesian Models. 307--330.

**Ruitao Liu**, **Arijit Chakrabarti**, **Tapas Samanta**, **Jayanta K. Ghosh**, **Malay Ghosh**. On Divergence Measures Leading to Jeffreys and Other Reference Priors. 331--370.

**Xin-Yuan Song**, **Jing-Heng Cai**, **Xiang-Nan Feng**, **Xue-Jun Jiang**. Bayesian Analysis of the Functional-Coefficient Autoregressive Heteroscedastic Model. 371--396.

**Yu Ryan Yue**, **Daniel Simpson**, **Finn Lindgren**, **Håvard Rue**. Bayesian Adaptive Smoothing Splines Using Stochastic Differential Equations. 397--424.

**Jaakko Riihimäki**, **Aki Vehtari**. Laplace Approximation for Logistic Gaussian Process Density Estimation and Regression. 425--448.

**Fei Liu**, **Sounak Chakraborty**, **Fan Li**, **Yan Liu**, **Aurelie C. Lozano**. Bayesian Regularization via Graph Laplacian. 449--474.

**Catia Scricciolo**. Adaptive Bayesian Density Estimation in $L^{p}$ -metrics with Pitman-Yor or Normalized Inverse-Gaussian Process Kernel Mixtures. 475--520.

### Matrix-Variate Dirichlet Process Priors with Applications

**Zhihua Zhang**,

**Dakan Wang**,

**Guang Dai**,

**Michael I. Jordan**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 259--286.

**Abstract:**

In this paper we propose a matrix-variate Dirichlet process (MATDP) for modeling the joint prior of a set of random matrices. Our approach is able to share statistical strength among regression coefficient matrices due to the clustering property of the Dirichlet process. Moreover, since the base probability measure is defined as a matrix-variate distribution, the dependence among the elements of each random matrix is described via the matrix-variate distribution. We apply MATDP to multivariate supervised learning problems. In particular, we devise a nonparametric discriminative model and a nonparametric latent factor model. The interest is in considering correlations both across response variables (or covariates) and across response vectors. We derive Markov chain Monte Carlo algorithms for posterior inference and prediction, and illustrate the application of the models to multivariate regression, multi-class classification and multi-label prediction problems.

### Bayesian Sequential Experimental Design for Binary Response Data with Application to Electromyographic Experiments

**Nammam Ali Azadi**,

**Paul Fearnhead**,

**Gareth Ridall**,

**Joleen H. Blok**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 287--306.

**Abstract:**

We develop a sequential Monte Carlo approach for Bayesian analysis of the experimental design for binary response data. Our work is motivated by surface electromyographic (SEMG) experiments, which can be used to provide information about the functionality of subjects’ motor units. These experiments involve a series of stimuli being applied to a motor unit, with whether or not the motor unit fires for each stimulus being recorded. The aim is to learn about how the probability of firing depends on the applied stimulus (the so-called stimulus-response curve). One such excitability parameter is an estimate of the stimulus level for which the motor unit has a 50% chance of firing. Within such an experiment we are able to choose the next stimulus level based on the past observations. We show how sequential Monte Carlo can be used to analyse such data in an online manner. We then use the current estimate of the posterior distribution in order to choose the next stimulus level. The aim is to select a stimulus level that mimimises the expected loss of estimating a quantity, or quantities, of interest. We will apply this loss function to the estimates of target quantiles from the stimulus-response curve. Through simulation we show that this approach is more efficient than existing sequential design methods in terms of estimating the quantile(s) of interest. If applied in practice, it could reduce the length of SEMG experiments by a factor of three.

### Local-Mass Preserving Prior Distributions for Nonparametric Bayesian Models

**Juhee Lee**,

**Steven N. MacEachern**,

**Yiling Lu**,

**Gordon B. Mills**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 307--330.

**Abstract:**

We address the problem of prior specification for models involving the two-parameter Poisson-Dirichlet process. These models are sometimes partially subjectively specified and are always partially (or fully) specified by a rule. We develop prior distributions based on local mass preservation. The robustness of posterior inference to an arbitrary choice of overdispersion under the proposed and current priors is investigated. Two examples are provided to demonstrate the properties of the proposed priors. We focus on the three major types of inference: clustering of the parameters of interest, estimation and prediction. The new priors are found to provide more stable inference about clustering than traditional priors while showing few drawbacks. Furthermore, it is shown that more stable clustering results in more stable inference for estimation and prediction. We recommend the local-mass preserving priors as a replacement for the traditional priors.

### On Divergence Measures Leading to Jeffreys and Other Reference Priors

**Ruitao Liu**,

**Arijit Chakrabarti**,

**Tapas Samanta**,

**Jayanta K. Ghosh**,

**Malay Ghosh**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 331--370.

**Abstract:**

The paper presents new measures of divergence between prior and posterior which are maximized by the Jeffreys prior. We provide two methods for proving this, one of which provides an easy to verify sufficient condition. We use such divergences to measure information in a prior and also obtain new objective priors outside the class of Bernardo’s reference priors.

### Bayesian Analysis of the Functional-Coefficient Autoregressive Heteroscedastic Model

**Xin-Yuan Song**,

**Jing-Heng Cai**,

**Xiang-Nan Feng**,

**Xue-Jun Jiang**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 371--396.

**Abstract:**

In this paper, we propose a new model called the functional-coefficient autoregressive heteroscedastic (FARCH) model for nonlinear time series. The FARCH model extends the existing functional-coefficient autoregressive models and double-threshold autoregressive heteroscedastic models by providing a flexible framework for the detection of nonlinear features for both the conditional mean and conditional variance. We propose a Bayesian approach, along with the Bayesian P-splines technique and Markov chain Monte Carlo algorithm, to estimate the functional coefficients and unknown parameters of the model. We also conduct model comparison via the Bayes factor. The performance of the proposed methodology is evaluated via a simulation study. A real data set derived from the daily S&P 500 Composite Index is used to illustrate the methodology.

### Bayesian Adaptive Smoothing Splines Using Stochastic Differential Equations

**Yu Ryan Yue**,

**Daniel Simpson**,

**Finn Lindgren**,

**Håvard Rue**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 397--424.

**Abstract:**

The smoothing spline is one of the most popular curve-fitting methods, partly because of empirical evidence supporting its effectiveness and partly because of its elegant mathematical formulation. However, there are two obstacles that restrict the use of the smoothing spline in practical statistical work. Firstly, it becomes computationally prohibitive for large data sets because the number of basis functions roughly equals the sample size. Secondly, its global smoothing parameter can only provide a constant amount of smoothing, which often results in poor performances when estimating inhomogeneous functions. In this work, we introduce a class of adaptive smoothing spline models that is derived by solving certain stochastic differential equations with finite element methods. The solution extends the smoothing parameter to a continuous data-driven function, which is able to capture the change of the smoothness of the underlying process. The new model is Markovian, which makes Bayesian computation fast. A simulation study and real data example are presented to demonstrate the effectiveness of our method.

### Laplace Approximation for Logistic Gaussian Process Density Estimation and Regression

**Jaakko Riihimäki**,

**Aki Vehtari**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 425--448.

**Abstract:**

Logistic Gaussian process (LGP) priors provide a flexible alternative for modelling unknown densities. The smoothness properties of the density estimates can be controlled through the prior covariance structure of the LGP, but the challenge is the analytically intractable inference. In this paper, we present approximate Bayesian inference for LGP density estimation in a grid using Laplace’s method to integrate over the non-Gaussian posterior distribution of latent function values and to determine the covariance function parameters with type-II maximum a posteriori (MAP) estimation. We demonstrate that Laplace’s method with MAP is sufficiently fast for practical interactive visualisation of 1D and 2D densities. Our experiments with simulated and real 1D data sets show that the estimation accuracy is close to a Markov chain Monte Carlo approximation and state-of-the-art hierarchical infinite Gaussian mixture models. We also construct a reduced-rank approximation to speed up the computations for dense 2D grids, and demonstrate density regression with the proposed Laplace approach.

### Bayesian Regularization via Graph Laplacian

**Fei Liu**,

**Sounak Chakraborty**,

**Fan Li**,

**Yan Liu**,

**Aurelie C. Lozano**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 449--474.

**Abstract:**

Regularization plays a critical role in modern statistical research, especially in high-dimensional variable selection problems. Existing Bayesian methods usually assume independence between variables a priori. In this article, we propose a novel Bayesian approach, which explicitly models the dependence structure through a graph Laplacian matrix. We also generalize the graph Laplacian to allow both positively and negatively correlated variables. A prior distribution for the graph Laplacian is then proposed, which allows conjugacy and thereby greatly simplifies the computation. We show that the proposed Bayesian model leads to proper posterior distribution. Connection is made between our method and some existing regularization methods, such as Elastic Net, Lasso, Octagonal Shrinkage and Clustering Algorithm for Regression (OSCAR) and Ridge regression. An efficient Markov Chain Monte Carlo method based on parameter augmentation is developed for posterior computation. Finally, we demonstrate the method through several simulation studies and an application on a real data set involving key performance indicators of electronics companies.

### Adaptive Bayesian Density Estimation in $L^{p}$ -metrics with Pitman-Yor or Normalized Inverse-Gaussian Process Kernel Mixtures

**Catia Scricciolo**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 475--520.

**Abstract:**

We consider Bayesian nonparametric density estimation using a Pitman-Yor or a normalized inverse-Gaussian process convolution kernel mixture as the prior distribution for a density. The procedure is studied from a frequentist perspective. Using the stick-breaking representation of the Pitman-Yor process and the finite-dimensional distributions of the normalized inverse-Gaussian process, we prove that, when the data are independent replicates from a density with analytic or Sobolev smoothness, the posterior distribution concentrates on shrinking $L^{p}$ -norm balls around the sampling density at a minimax-optimal rate, up to a logarithmic factor. The resulting hierarchical Bayesian procedure, with a fixed prior, is adaptive to the unknown smoothness of the sampling density.

### Bayesian Analysis, Volume 9, Number 2 (2014)

Contents:

**Zhihua Zhang**, **Dakan Wang**, **Guang Dai**, **Michael I. Jordan**. Matrix-Variate Dirichlet Process Priors with Applications. 259--286.

**Nammam Ali Azadi**, **Paul Fearnhead**, **Gareth Ridall**, **Joleen H. Blok**. Bayesian Sequential Experimental Design for Binary Response Data with Application to Electromyographic Experiments. 287--306.

**Juhee Lee**, **Steven N. MacEachern**, **Yiling Lu**, **Gordon B. Mills**. Local-Mass Preserving Prior Distributions for Nonparametric Bayesian Models. 307--330.

**Ruitao Liu**, **Arijit Chakrabarti**, **Tapas Samanta**, **Jayanta K. Ghosh**, **Malay Ghosh**. On Divergence Measures Leading to Jeffreys and Other Reference Priors. 331--370.

**Xin-Yuan Song**, **Jing-Heng Cai**, **Xiang-Nan Feng**, **Xue-Jun Jiang**. Bayesian Analysis of the Functional-Coefficient Autoregressive Heteroscedastic Model. 371--396.

**Yu Ryan Yue**, **Daniel Simpson**, **Finn Lindgren**, **Håvard Rue**. Bayesian Adaptive Smoothing Splines Using Stochastic Differential Equations. 397--424.

**Jaakko Riihimäki**, **Aki Vehtari**. Laplace Approximation for Logistic Gaussian Process Density Estimation and Regression. 425--448.

**Fei Liu**, **Sounak Chakraborty**, **Fan Li**, **Yan Liu**, **Aurelie C. Lozano**. Bayesian Regularization via Graph Laplacian. 449--474.

**Catia Scricciolo**. Adaptive Bayesian Density Estimation in $L^{p}$ -metrics with Pitman-Yor or Normalized Inverse-Gaussian Process Kernel Mixtures. 475--520.

### optimal transport and Wasserstein barycentres

**F**ollowing my musing about using medians versus means, a few days ago, Arnaud Doucet sent me a paper he recently wrote with Marco Cuturi for the incoming ICML meeting in Beijing. (The program is full of potentially interesting papers on computational methods.) The starting point is the *Wasserstein distance* between two probability measures, which amounts to finding the most correlated copula associated with these measures. Correlation being measured by a certain Euclidean distance. A second notion is a *Wasserstein barycentre*, which is the measure minimising a sum or average of Wasserstein distances to several measures. The connection with the random measures of the previous post is to find an estimator as an empirical measure that minimises the average of Wasserstein distances to several empirical measures. When the support of the empirical distribution is fixed, the weights are derived by Cuturi and Doucet by subgradient methods. When the support is free (but with a maximal size), they propose an alternating optimisation extension to derive the Wasserstein barycentre. Those algorithms being extremely costly, the authors move to a smoothed and much less intensive version. As I am completely novice in this topic, I cannot say much about the method, but the illustration on a digit restoration dataset is certainly impressive!

Filed under: Statistics

### noninformative priors for mixtures

*“A novel formulation of the mixture model is introduced, which includes the prior constraint that each Gaussian component is always assigned a minimal number of data points. This enables noninformative improper priors such as the Jeffreys prior to be placed on the component parameters. We demonstrate difficulties involved in specifying a prior for the standard Gaussian mixture model, and show how the new model can be used to overcome these. MCMC methods are given for efficient sampling from the posterior of this model.” *C. Stoneking

**F**ollowing in the theme of the Jeffreys’ post of two weeks ago, I spotted today a newly arXived paper about using improper priors for mixtures…and surviving it! It is entitled “Bayesian inference of Gaussian mixture models with noninformative priors” and written by Colin Stoneking at ETH Zürich. As mentioned in the previous post, one specificity of our 1990-1994 paper on mixture with Jean Diebolt was to allow for improper priors by imposing at least two observations per component. The above abstract thus puzzled me until I found on page 3 that the paper was indeed related to ours (and Larry’s 2000 validation)! Actually, I should not complain about citations of my earlier works on mixtures as they cover seven different papers, but the bibliography is somewhat missing the paper we wrote with George Casella and Marty Wells in *Statistical Methodology* in 2004 (this was actually the very first paper of this new journal!), where we show that conjugate priors allow for the integration of the weights, resulting in a close-form expression for the distribution of the partition vector. (This was also extended in the chapter “Exact Bayesian Analysis of Mixtures” I wrote with Kerrie Mengersen in our book Mixtures: Estimation and Applications.)

*“There is no well-founded, general method to choose the parameters of a given prior to make it weakly informative for Gaussian mixtures.” *C. Stoneking

**T**he first part of the paper shows why looking for weakly informative priors is doomed to fail in this mixture setting: there is no stabilisation as hyperparameters get towards the border (between proper-ness and improper-ness), and on the opposite the frequency of appearances of empty components grows steadily to 100%… The second part gets to the reassessment of our 1990 exclusion trick, first considering that it is not producing a true posterior, then criticising Larry’s 2000 analysis as building a data-dependent “prior”, and at last proposing a reformulation where the exclusion of the empty components and those with one allocated observation becomes part of the “prior” (albeit a prior on the allocation vector). In fine, the posterior thus constructed remains the same as ours, with a message that if we start our model as the likelihood *of the sample* excluding empty or single-observation terms, we can produce a proper Bayesian analysis. (Except for a missing if minor renormalisation.) This leads me to wonder about the conclusion that inference about the (unknown) number of components in the mixture being impossible from this perspective. For instance, we could define fractional Bayes factors à la O’Hagan (1995) this way, i.e. starting from the restricted likelihood and taking a fraction of the likelihood to make the posterior proper, then using the remaining fraction to compute a Bayes factor. (Fractional Bayes factors do not work for the regular likelihood of a Gaussian mixture, irrespective of the sample size.)

Filed under: Books, Statistics, University life Tagged: conjugate priors, Edinburgh, Gaussian mixture, Gibbs sampling, improper priors, Jean Diebolt, Jeffreys priors, MCMC, noninformative priors

### Resistance

Filed under: Statistics Tagged: Anna Marly, anti-fascism, French resistance, la complainte du partisan, Leonard Cohen

### Flaggermusmannen [book review]

*“Cold, concise statistics. Keyword number one is statistical significance. In other words, we are looking for a system that cannot be explained by statistical chance (…) this group constitutes less than five percent of the female population. Yet I was left with seven murders and over forty rapes.” *J. Nesbø

**A**nother first novel! *The Bat (Flaggermusmannen)* by Jo Nesbø has been sitting in my bedside book pile for quite a while, until I decided to read it a few days ago. It is the first appearance of Inspector Harry Hole in a published book and was written in 1997, although translated into English much much later. (The book was nominated as Best Norwegian Crime Novel of the Year and as Best Nordic Crime Novel of the Year.)

*“Life consists of a series of quite improbable chance occurrences (…) What bothers me is that I’ve got that lottery number too many times in a row.”* J. Nesbø

**I** read the (later) novel *The Redeemer* a few years ago, taking place mostly in Norway and kept a globally positive impression about the book, even though the plot was a bit stretched… *The Bat * has somewhat the same defects as *The Ice Princess* in that it sounds too much like an exercise in thriller writing, albeit in a much less clumsy style! The central character of Harry Hole is well-done, in an engaging-despite-his-shortcomings style and the way he gets along with most of the people he meets is rather realistic. However, the setting of the first novel in Australia (rather than Norway) is sort of a failure in that the country and Sydney are more caricatures than realistic in any degree…. For instance, every aboriginal Harry meets must resort to traditional tales involving emus and lizards and other local animals. One such tale would be ok but so many are just a bore. The title itself is connected to yet another aboriginal myth. And to the murders occurring way too often in the novel. Similarly, every foreign backpacker met in the pages of *The Bat * is either dumb or on her way to become a waitress to recover from a failed love affair. And a major character is a transvestite playing in a theatre, maybe because Nesbø has watched *Priscilla Queen of the Desert* a few years earlier… And the Australian police officers sound both very heavy in colloquialism and quite light in detective skills. Lacking an obvious connection to a series of young women murders throughout Australia. The second part of the novel gets too artificial to remain gripping and I completed the book with a feeling of chore accomplished…, not of surprise or shock at the resolution of the murders. I thus concur with many other readers of the book that it is certainly far from being the best in the series!

Filed under: Books, Kids, Travel Tagged: aborigines, Australia, Harry Hole, Jo Nesbo, Priscilla Queen of the Desert, Sydney, The Bat

### vote for Europe!

### bluebells

### Bayesian Analysis, Volume 9, Number 1 (2014)

Contents:

**Francisco J. Rubio**, **Mark F. J. Steel**. Inference in Two-Piece Location-Scale Models with Jeffreys Priors. 1--22.

**José M. Bernardo**. Comment on Article by Rubio and Steel. 23--24.

**James G. Scott**. Comment on Article by Rubio and Steel. 25--28.

**Robert E. Weiss**, **Marc A. Suchard**. Comment on Article by Rubio and Steel. 29--38.

**Xinyi Xu**. Comment on Article by Rubio and Steel. 39--44.

**Francisco J. Rubio**, **Mark F. J. Steel**. Rejoinder. 45--52.

**Lorna M. Barclay**, **Jane L. Hutton**, **Jim Q. Smith**. Chain Event Graphs for Informed Missingness. 53--76.

**David A. Wooff**. Bayes Linear Sufficiency in Non-exchangeable Multivariate Multiple Regressions. 77--96.

**Theodore Papamarkou**, **Antonietta Mira**, **Mark Girolami**. Zero Variance Differential Geometric Markov Chain Monte Carlo Algorithms. 97--128.

**Erlis Ruli**, **Nicola Sartori**, **Laura Ventura**. Marginal Posterior Simulation via Higher-order Tail Area Approximations. 129--146.

**Luis E. Nieto-Barajas**, **Alberto Contreras-Cristán**. A Bayesian Nonparametric Approach for Time Series Clustering. 147--170.

**Friederike Greb**, **Tatyana Krivobokova**, **Axel Munk**, **Stephan von Cramon-Taubadel**. Regularized Bayesian Estimation of Generalized Threshold Regression Models. 171--196.

**Cristiano Villa**, **Stephen G. Walker**. Objective Prior for the Number of Degrees of Freedom of a t Distribution. 197--220.

**Veronika Rockova**, **Emmanuel Lesaffre**. Incorporating Grouping Information in Bayesian Variable Selection with Applications in Genomics. 221--258.

### understanding complex and large industrial data (UCLID 2014)

**J**ust received this announcement of the UCLID 2014 conference in Lancaster, July 1-2 2014:

Understanding Complex and Large Industrial Data 2014, or UCLID, is a workshop which aims to provide an opportunity for academic researchers and industrial practitioners to work together and share ideas on the fast developing field of ‘big data’ analysis. This is a growing area of importance within academia and industry where the potential for new research and economic impact has been recognised.

UCLID 2014 is hosted by the STOR-i Doctoral Training Centre, which is based at Lancaster University. STOR-i’s unique position between academia and industry provides an ideal venue for this event, as this workshop builds upon STOR-i’s philosophy of cross-collaboration and implementation of new research within the wider community.

Filed under: pictures, R, Statistics, University life Tagged: academia and industry, big data, England, Lancaster University, UCLID, workshop