## Bayesian News Feeds

### back in Warwick

Filed under: pictures, Running, Travel, University life Tagged: England, summer, University of Warwick, Zeeman building

### recycling accept-reject rejections (#2)

**F**ollowing yesterday’s post on Rao’s, Liu’s, and Dunson’s paper on a new approach to intractable normalising constants, and taking advantage of being in Warwick, I tested the method on a toy model, namely the posterior associated with n Student’s t observations with unknown location parameter μ and a flat prior,

which is “naturally” bounded by a Cauchy density with scale √ν. The constant M is then easily derived and running the new algorithm follows from a normal random walk proposal targeting the augmented likelihood (R code below).

**A**s shown by the above graph, the completion-by-rejection scheme produces a similar outcome (tomato) as the one based on the sole observations (steelblue). With a similar acceptance rate. However, the computing time is much much degraded:

when compared with the no-completion version. Here is the entire R code that produced both MCMC samples:

#Student t observations and flat prior nu=4 n=25 M=pi*sqrt(nu) sqrtnu=sqrt(nu) obs=rt(n,df=4) sdobs=sd(obs) #unormalised t mydt=function(x,mu){ return(dt(x-mu,df=nu)/dt(0,df=nu))} mydtc=cmpfun(mydt) mydcauchy=function(x,mu){ y=(x-mu)/sqrtnu return(dcauchy(y)/sqrtnu)} mydcaucchy=cmpfun(mydcauchy) #augmented data augmen=function(mu){ y=NULL for (i in 1:n){ prop=mu+rcauchy(1)*sqrtnu reject=(runif(1)<mydtc(prop,mu)/(M*mydcaucchy(prop,mu))) while (!reject){ y=c(y,prop) prop=mu+rcauchy(1)*sqrtnu reject=(runif(1)<mydtc(prop,mu)/(M*mydcaucchy(prop,mu)))} } return(y)} #Gibbs gibbsda=function(T=10^4){ theta=rep(0,T) for (t in 2:T){ rej=augmen(theta[t-1]) theta[t]=prop=theta[t-1]+rnorm(1,sd=.1*sdobs) propdens=sum(dt(obs-prop,df=nu,log=TRUE))+ sum(log(mydcaucchy(rej,prop)-mydtc(rej,mu=prop)/M)) refdens=sum(dt(obs-theta[t-1],df=nu,log=TRUE))+ sum(log(mydcaucchy(rej,theta[t-1])-mydtc(rej,mu=theta[t-1])/M)) if (log(runif(1))>propdens-refdens) theta[t]=theta[t-1] } return(theta)} g8=cmpfun(gibbsda) gibbs2=function(T=10^4){ eta=rep(0,T) for (t in 2:T){ eta[t]=prop=eta[t-1]+rnorm(1,sd=sdobs) propdens=sum(dt(obs-prop,df=nu,log=TRUE)) refdens=sum(dt(obs-eta[t-1],df=nu,log=TRUE)) if (log(runif(1))>propdens-refdens) eta[t]=eta[t-1] } return(eta)} g9=cmpfun(gibbsda)Filed under: R, Statistics, University life Tagged: accept-reject algorithm, compiler, Data augmentation, Gibbs sampling, MCMC, Monte Carlo Statistical Methods, Student's t distribution

### Informative g-Priors for Logistic Regression

### Cluster Analysis, Model Selection, and Prior Distributions on Models

### Bayesian Multiscale Smoothing of Gaussian Noised Images

### The Performance of Covariance Selection Methods That Consider Decomposable Models Only

### Toward Rational Social Decisions: A Review and Some Results

### Spatial Bayesian variable selection models on functional magnetic resonance imaging time-series data

### Robust Bayesian Graphical Modeling Using Dirichlet t-Distributions

### recycling accept-reject rejections

**V**inayak Rao, Lizhen Lin and David Dunson just arXived a paper which proposes anew technique to handle intractable normalising constants. And which exact title is Data augmentation for models based on rejection sampling. (Paper that I read in the morning plane to B’ham, since this is one of my weeks in Warwick.) The central idea therein is that, if the sample density (*aka* likelihood) satisfies

where all terms but p are known in closed form, then completion by the rejected values of an hypothetical accept-reject algorithm−hypothetical in the sense that the data does not have to be produced by an accept-reject scheme but simply the above domination condition to hold−allows for a data augmentation scheme. Without requiring the missing normalising constant. Since the completed likelihood is

A closed-form, if not necessarily congenial, function.

**N**ow this is quite a different use of the “rejected values” from the accept reject algorithm when compared with our 1996 Biometrika paper on the Rao-Blackwellisation of accept-reject schemes (which, still, could have been mentioned there… Or Section 4.2 of Monte Carlo Statistical Methods. Rather than re-deriving the joint density of the augmented sample, “accepted+rejected”.)

**I**t is a neat idea in that it completely bypasses the approximation of the normalising constant. And avoids the somewhat delicate tuning of the auxiliary solution of Moller et al. (2006) The difficulty with this algorithm is however in finding an upper bound M on the unnormalised density f that is

- in closed form;
- with a manageable and tight enough “constant” M;
- compatible with running a posterior simulation conditional on the added rejections.

The paper seems to assume further that the bound M is independent from the current parameter value θ, at least as suggested by the notation (and Theorem 2), but this is not in the least necessary for the validation of the formal algorithm. Such a constraint would pull M higher, hence reducing the efficiency of the method. Actually the matrix Langevin distribution considered in the first example involves a bound that depends on the parameter κ.

**T**he paper includes a result (Theorem 2) on the uniform ergodicity that relies on heavy assumptions on the proposal distribution. And a rather surprising one, namely that the probability of *rejection* is bounded from below, i.e. calling for a *less* efficient proposal. Now it seems to me that a uniform ergodicity result holds as well when the probability of *acceptance* is bounded from below since, then, the event when no rejection occurs constitutes an atom from the augmented Markov chain viewpoint. There therefore occurs a renewal each time the rejected variable set ϒ is empty, and ergodicity ensues (Robert, 1995, *Statistical Science*).

**N**ote also that, despite the opposition raised by the authors, the method *per se* does constitute a pseudo-marginal technique à la Andrieu-Roberts (2009) since the independent completion by the (pseudo) rejected variables produces an unbiased estimator of the likelihood. It would thus be of interest to see how the recent evaluation tools of Andrieu and Vihola can assess the loss in efficiency induced by this estimation of the likelihood.

*Maybe some further experimental evidence tomorrow…*

Filed under: Statistics, University life Tagged: accept-reject algorithm, arXiv, auxiliary variable, Data augmentation, George Casella, intractable likelihood, Monte Carlo Statistical Methods, Rao-Blackwellisation, recycling, untractable normalizing constant

### Bayesian Analysis, Volume 9, Number 2 (2014)

Contents:

**Zhihua Zhang**, **Dakan Wang**, **Guang Dai**, **Michael I. Jordan**. Matrix-Variate Dirichlet Process Priors with Applications. 259--286.

**Nammam Ali Azadi**, **Paul Fearnhead**, **Gareth Ridall**, **Joleen H. Blok**. Bayesian Sequential Experimental Design for Binary Response Data with Application to Electromyographic Experiments. 287--306.

**Juhee Lee**, **Steven N. MacEachern**, **Yiling Lu**, **Gordon B. Mills**. Local-Mass Preserving Prior Distributions for Nonparametric Bayesian Models. 307--330.

**Ruitao Liu**, **Arijit Chakrabarti**, **Tapas Samanta**, **Jayanta K. Ghosh**, **Malay Ghosh**. On Divergence Measures Leading to Jeffreys and Other Reference Priors. 331--370.

**Xin-Yuan Song**, **Jing-Heng Cai**, **Xiang-Nan Feng**, **Xue-Jun Jiang**. Bayesian Analysis of the Functional-Coefficient Autoregressive Heteroscedastic Model. 371--396.

**Yu Ryan Yue**, **Daniel Simpson**, **Finn Lindgren**, **Håvard Rue**. Bayesian Adaptive Smoothing Splines Using Stochastic Differential Equations. 397--424.

**Jaakko Riihimäki**, **Aki Vehtari**. Laplace Approximation for Logistic Gaussian Process Density Estimation and Regression. 425--448.

**Fei Liu**, **Sounak Chakraborty**, **Fan Li**, **Yan Liu**, **Aurelie C. Lozano**. Bayesian Regularization via Graph Laplacian. 449--474.

**Catia Scricciolo**. Adaptive Bayesian Density Estimation in $L^{p}$ -metrics with Pitman-Yor or Normalized Inverse-Gaussian Process Kernel Mixtures. 475--520.

### Matrix-Variate Dirichlet Process Priors with Applications

**Zhihua Zhang**,

**Dakan Wang**,

**Guang Dai**,

**Michael I. Jordan**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 259--286.

**Abstract:**

In this paper we propose a matrix-variate Dirichlet process (MATDP) for modeling the joint prior of a set of random matrices. Our approach is able to share statistical strength among regression coefficient matrices due to the clustering property of the Dirichlet process. Moreover, since the base probability measure is defined as a matrix-variate distribution, the dependence among the elements of each random matrix is described via the matrix-variate distribution. We apply MATDP to multivariate supervised learning problems. In particular, we devise a nonparametric discriminative model and a nonparametric latent factor model. The interest is in considering correlations both across response variables (or covariates) and across response vectors. We derive Markov chain Monte Carlo algorithms for posterior inference and prediction, and illustrate the application of the models to multivariate regression, multi-class classification and multi-label prediction problems.

### Bayesian Sequential Experimental Design for Binary Response Data with Application to Electromyographic Experiments

**Nammam Ali Azadi**,

**Paul Fearnhead**,

**Gareth Ridall**,

**Joleen H. Blok**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 287--306.

**Abstract:**

We develop a sequential Monte Carlo approach for Bayesian analysis of the experimental design for binary response data. Our work is motivated by surface electromyographic (SEMG) experiments, which can be used to provide information about the functionality of subjects’ motor units. These experiments involve a series of stimuli being applied to a motor unit, with whether or not the motor unit fires for each stimulus being recorded. The aim is to learn about how the probability of firing depends on the applied stimulus (the so-called stimulus-response curve). One such excitability parameter is an estimate of the stimulus level for which the motor unit has a 50% chance of firing. Within such an experiment we are able to choose the next stimulus level based on the past observations. We show how sequential Monte Carlo can be used to analyse such data in an online manner. We then use the current estimate of the posterior distribution in order to choose the next stimulus level. The aim is to select a stimulus level that mimimises the expected loss of estimating a quantity, or quantities, of interest. We will apply this loss function to the estimates of target quantiles from the stimulus-response curve. Through simulation we show that this approach is more efficient than existing sequential design methods in terms of estimating the quantile(s) of interest. If applied in practice, it could reduce the length of SEMG experiments by a factor of three.

### Local-Mass Preserving Prior Distributions for Nonparametric Bayesian Models

**Juhee Lee**,

**Steven N. MacEachern**,

**Yiling Lu**,

**Gordon B. Mills**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 307--330.

**Abstract:**

We address the problem of prior specification for models involving the two-parameter Poisson-Dirichlet process. These models are sometimes partially subjectively specified and are always partially (or fully) specified by a rule. We develop prior distributions based on local mass preservation. The robustness of posterior inference to an arbitrary choice of overdispersion under the proposed and current priors is investigated. Two examples are provided to demonstrate the properties of the proposed priors. We focus on the three major types of inference: clustering of the parameters of interest, estimation and prediction. The new priors are found to provide more stable inference about clustering than traditional priors while showing few drawbacks. Furthermore, it is shown that more stable clustering results in more stable inference for estimation and prediction. We recommend the local-mass preserving priors as a replacement for the traditional priors.

### On Divergence Measures Leading to Jeffreys and Other Reference Priors

**Ruitao Liu**,

**Arijit Chakrabarti**,

**Tapas Samanta**,

**Jayanta K. Ghosh**,

**Malay Ghosh**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 331--370.

**Abstract:**

The paper presents new measures of divergence between prior and posterior which are maximized by the Jeffreys prior. We provide two methods for proving this, one of which provides an easy to verify sufficient condition. We use such divergences to measure information in a prior and also obtain new objective priors outside the class of Bernardo’s reference priors.

### Bayesian Analysis of the Functional-Coefficient Autoregressive Heteroscedastic Model

**Xin-Yuan Song**,

**Jing-Heng Cai**,

**Xiang-Nan Feng**,

**Xue-Jun Jiang**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 371--396.

**Abstract:**

In this paper, we propose a new model called the functional-coefficient autoregressive heteroscedastic (FARCH) model for nonlinear time series. The FARCH model extends the existing functional-coefficient autoregressive models and double-threshold autoregressive heteroscedastic models by providing a flexible framework for the detection of nonlinear features for both the conditional mean and conditional variance. We propose a Bayesian approach, along with the Bayesian P-splines technique and Markov chain Monte Carlo algorithm, to estimate the functional coefficients and unknown parameters of the model. We also conduct model comparison via the Bayes factor. The performance of the proposed methodology is evaluated via a simulation study. A real data set derived from the daily S&P 500 Composite Index is used to illustrate the methodology.

### Bayesian Adaptive Smoothing Splines Using Stochastic Differential Equations

**Yu Ryan Yue**,

**Daniel Simpson**,

**Finn Lindgren**,

**Håvard Rue**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 397--424.

**Abstract:**

The smoothing spline is one of the most popular curve-fitting methods, partly because of empirical evidence supporting its effectiveness and partly because of its elegant mathematical formulation. However, there are two obstacles that restrict the use of the smoothing spline in practical statistical work. Firstly, it becomes computationally prohibitive for large data sets because the number of basis functions roughly equals the sample size. Secondly, its global smoothing parameter can only provide a constant amount of smoothing, which often results in poor performances when estimating inhomogeneous functions. In this work, we introduce a class of adaptive smoothing spline models that is derived by solving certain stochastic differential equations with finite element methods. The solution extends the smoothing parameter to a continuous data-driven function, which is able to capture the change of the smoothness of the underlying process. The new model is Markovian, which makes Bayesian computation fast. A simulation study and real data example are presented to demonstrate the effectiveness of our method.

### Laplace Approximation for Logistic Gaussian Process Density Estimation and Regression

**Jaakko Riihimäki**,

**Aki Vehtari**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 425--448.

**Abstract:**

Logistic Gaussian process (LGP) priors provide a flexible alternative for modelling unknown densities. The smoothness properties of the density estimates can be controlled through the prior covariance structure of the LGP, but the challenge is the analytically intractable inference. In this paper, we present approximate Bayesian inference for LGP density estimation in a grid using Laplace’s method to integrate over the non-Gaussian posterior distribution of latent function values and to determine the covariance function parameters with type-II maximum a posteriori (MAP) estimation. We demonstrate that Laplace’s method with MAP is sufficiently fast for practical interactive visualisation of 1D and 2D densities. Our experiments with simulated and real 1D data sets show that the estimation accuracy is close to a Markov chain Monte Carlo approximation and state-of-the-art hierarchical infinite Gaussian mixture models. We also construct a reduced-rank approximation to speed up the computations for dense 2D grids, and demonstrate density regression with the proposed Laplace approach.

### Bayesian Regularization via Graph Laplacian

**Fei Liu**,

**Sounak Chakraborty**,

**Fan Li**,

**Yan Liu**,

**Aurelie C. Lozano**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 449--474.

**Abstract:**

Regularization plays a critical role in modern statistical research, especially in high-dimensional variable selection problems. Existing Bayesian methods usually assume independence between variables a priori. In this article, we propose a novel Bayesian approach, which explicitly models the dependence structure through a graph Laplacian matrix. We also generalize the graph Laplacian to allow both positively and negatively correlated variables. A prior distribution for the graph Laplacian is then proposed, which allows conjugacy and thereby greatly simplifies the computation. We show that the proposed Bayesian model leads to proper posterior distribution. Connection is made between our method and some existing regularization methods, such as Elastic Net, Lasso, Octagonal Shrinkage and Clustering Algorithm for Regression (OSCAR) and Ridge regression. An efficient Markov Chain Monte Carlo method based on parameter augmentation is developed for posterior computation. Finally, we demonstrate the method through several simulation studies and an application on a real data set involving key performance indicators of electronics companies.

### Adaptive Bayesian Density Estimation in $L^{p}$ -metrics with Pitman-Yor or Normalized Inverse-Gaussian Process Kernel Mixtures

**Catia Scricciolo**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 475--520.

**Abstract:**

We consider Bayesian nonparametric density estimation using a Pitman-Yor or a normalized inverse-Gaussian process convolution kernel mixture as the prior distribution for a density. The procedure is studied from a frequentist perspective. Using the stick-breaking representation of the Pitman-Yor process and the finite-dimensional distributions of the normalized inverse-Gaussian process, we prove that, when the data are independent replicates from a density with analytic or Sobolev smoothness, the posterior distribution concentrates on shrinking $L^{p}$ -norm balls around the sampling density at a minimax-optimal rate, up to a logarithmic factor. The resulting hierarchical Bayesian procedure, with a fixed prior, is adaptive to the unknown smoothness of the sampling density.