Bayesian News Feeds
In connection with the previous announcement of ABC in Montréal, a call for papers that came out today:
NIPS 2014 Workshop: ABC in Montreal
December 12, 2014
Montréal, Québec, Canada
Approximate Bayesian computation (ABC) or likelihood-free (LF) methods have developed mostly beyond the radar of the machine learning community, but are important tools for a large segment of the scientific community. This is particularly true for systems and population biology, computational psychology, computational chemistry, etc. Recent work has both applied machine learning models and algorithms to general ABC inference (NN, forests, GPs) and ABC inference to machine learning (e.g. using computer graphics to solve computer vision using ABC). In general, however, there is significant room for collaboration between the two communities.
The workshop will consist of invited and contributed talks, poster spotlights, and a poster session. Rather than a panel discussion we will encourage open discussion between the speakers and the audience!
Examples of topics of interest in the workshop include (but are not limited to):
* Applications of ABC to machine learning, e.g., computer vision, inverse problems
* ABC in Systems Biology, Computational Science, etc
* ABC Reinforcement Learning
* Machine learning simulator models, e.g., NN models of simulation responses, GPs etc.
* Selection of sufficient statistics
* Online and post-hoc error
* ABC with very expensive simulations and acceleration methods (surrogate modeling, choice of design/simulation points)
* ABC with probabilistic programming
* Posterior evaluation of scientific problems/interaction with scientists
* Post-computational error assessment
* Impact on resulting ABC inference
* ABC for model selection
We invite submissions in NIPS 2014 format with a maximum of 4 pages, excluding references. Anonymity is not required. Relevant works that have been recently published or presented elsewhere are allowed, provided that previous publications are explicitly acknowledged. Please submit papers in PDF format to firstname.lastname@example.org .
This workshop has been endorsed by ISBA. As part of their sponsorship, ISBA will be awarding a limited number of travel awards to PhD students and young researchers. The organizing committee may nominate particularly strong submissions for this award.
In addition to the general ISBA endorsement, ABC in Montréal has been endorsed by the BayesComp section of ISBA.
Submission Deadline: October 9, 2014
Author Notification: October 26, 2014
Workshop: December 12 or 13, 2014
Michael Blum, Laboratoire TIMC-IMAG, Grenoble
Juliane Liepe, Imperial College London
Vikash Mansinghka, MIT
Frank Wood, Oxford
Neil Lawrence, University of Sheffield
Ted Meeds, University of Amsterdam
Christian Robert, Université Paris-Dauphine
Max Welling, University of Amsterdam
Richard Wilkinson, University of Nottingham
The organizers can be contacted at email@example.com.
Filed under: Statistics, Travel, University life Tagged: ABC, BayesComp, Canada, ISBA@NIPS, likelihood-free methods, machine learning, Montréal, NIPS 2014, Québec, simulation
I read the newly arXived paper “On Single Variable Transformation Approach to Markov Chain Monte Carlo” by Dey and Bhattacharya on the pleasant train ride between Bristol and Coventry last weekend. The paper actually follows several earlier papers by the authors that I have not read in detail. The notion of single variable transform is to add plus or minus the same random noise to all components of the current value of the Markov chain, instead of the standard d-dimensional random walk proposal of the reference Metropolis-Hastings algorithm, namely all proposals are of the form
meaning the chain proceeds [after acceptance] along one and only one of the d diagonals. The authors’ arguments are that (a) the proposal is cheaper and (b) the acceptance rate is higher. What I find questionable in this argument is that this does not directly matter in the evaluation of the performances of the algorithm. For instance, higher acceptance in a Metropolis-Hasting algorithm does not imply faster convergence and smaller asymptotic variance. (This goes without mentioning the fact that the comparative Figure 1 is so variable with the dimension as to be of limited worth. Figure 1 and 2 are also found in an earlier arXived paper of the authors.) For instance, restricting the moves along the diagonals of the Euclidean space implies that there is a positive probability to make two successive proposals along the same diagonal, which is a waste of time. When considering the two-dimensional case, joining two arbitrary points using an everywhere positive density g upon ε means generating two successive values from g, which is equivalent cost-wise to generating a single noise from a two-dimensional proposal. Without the intermediate step of checking the one-dimensional move along one diagonal. So much for a gain. In fine, the proposal found in this paper sums up as being a one-at-a-time version of a standard random walk Metropolis-Hastings algorithm.
Filed under: Books, Statistics, Travel Tagged: arXiv, asymptotic variance, Metropolis-Hastings, mixing speed, random walk
Last morning at the neuroscience workshop Jean-François Cardoso presented independent component analysis though a highly pedagogical and enjoyable tutorial that stressed the geometric meaning of the approach, summarised by the notion that the (ICA) decomposition
of the data X seeks both independence between the columns of S and non-Gaussianity. That is, getting as away from Gaussianity as possible. The geometric bits came from looking at the Kullback-Leibler decomposition of the log likelihood
where the expectation is computed under the true distribution P of the data X. And Qθ is the hypothesised distribution. A fine property of this decomposition is a statistical version of Pythagoreas’ theorem, namely that when the family of Qθ‘s is an exponential family, the Kullback-Leibler distance decomposes into
where θ⁰ is the expected maximum likelihood estimator of θ. (We also noticed this possibility of a decomposition in our Kullback-projection variable-selection paper with Jérôme Dupuis.) The talk by Aapo Hyvärinen this morning was related to Jean-François’ in that it used ICA all the way to a three-level representation if oriented towards natural vision modelling in connection with his book and the paper on unormalised models recently discussed on the ‘Og.
On the afternoon, Eric-Jan Wagenmaker [who persistently and rationally fight the (ab)use of p-values and who frequently figures on Andrew's blog] gave a warning tutorial talk about the dangers of trusting p-values and going fishing for significance in existing studies, much in the spirit of Andrew’s blog (except for the defence of Bayes factors). Arguing in favour of preregistration. The talk was full of illustrations from psychology. And included the line that ESP testing is the jester of academia, meaning that testing for whatever form of ESP should be encouraged as a way to check testing procedures. If a procedure finds a significant departure from the null in this setting, there is something wrong with it! I was then reminded that Eric-Jan was one of the authors having analysed Bem’s controversial (!) paper on the “anomalous processes of information or energy transfer that are currently unexplained in terms of known physical or biological mechanisms”… (And of the shocking talk by Jessica Utts on the same topic I attended in Australia two years ago.)
Filed under: pictures, Running, Statistics, Travel, University life Tagged: Australia, Bayes factor, computational vision, ESP, evidence, exponential family, ICA, independent component analysis, Kullback-Leibler divergence, normalising constant, p-values, Pythagorean theorem, statistical geometry, statistical significance
I just learned today that about 300 bouquetins had been killed in the French Alps the past few days as an hasty and ungrounded measure against bovine brucellosis. I find it amazing that the local authorities can act with so little scientific justification and against European regulations that make bouquetins a protected species. In comparison, the proposed culling of badgers in England went through experimental steps with some modicus of science. (Although it is supposed to resume next week despite Gareth’s recent ABC paper demonstrating culling is ineffective against bovine TB.)
Filed under: Mountains, pictures Tagged: Alps, Badger Trust, badgers, bouquetins, culling
As ‘Og’s readers may have noticed, I have very much appreciated Joe Abercombie’s novels and style so far, having read and reviewed all of his books. Hence, I was expecting something altogether different out of Half a King, his latest novel… Compared with the books written so far, this one feels too light, too easy-going, too much of a one-shot read, too linear and too predictable, with none of the shadows and shortcomings and other moral ambiguities crossing everyone and all in the novel. And making Abercrombie such a special author. The main character Yari is not very enticing and the way he gets out of dramatic situations is not particularly convincing. Nor particularly on the moral high ground (not surprising, this, considering Abercrombie’s style!) But it sounds as if this remains justified as lesser evil against greater evil… The final stages of the story are just too impossible to believe. So this book is a real disappointment. After reading the book in a few hours in Bristol, a few miles from the author who lives in Bath, I went hunting for reactions on the Internet and found out that this was a young adult novel, which may explain for the lack of depth and of moral ambiguity. I wish this had been spelled out more clearly before I had bought the book! (As an aside I wonder why Abercrombie has this fascination with maimed hands throughout his novels. From The Ninefinger in the early novel to this half king with only two fingers on his right hand.)
Filed under: Books, Travel Tagged: Half a King, Joe Abercrombie, ninefinger, young adult books
Source: Bayesian Analysis, Volume 9, Number 3, 521--550.
Bayesian graphical modeling provides an appealing way to obtain uncertainty estimates when inferring network structures, and much recent progress has been made for Gaussian models. For more robust inferences, it is natural to consider extensions to $t$ -distribution models. We argue that the classical multivariate $t$ -distribution, defined using a single latent Gamma random variable to rescale a Gaussian random vector, is of little use in more highly multivariate settings, and propose other, more flexible $t$ -distributions. Using an independent Gamma-divisor for each component of the random vector defines what we term the alternative $t$ -distribution. The associated model allows one to extract information from highly multivariate data even when most experiments contain outliers for some of their measurements. However, the use of this alternative model comes at increased computational cost and imposes constraints on the achievable correlation structures, raising the need for a compromise between the classical and alternative models. To this end we propose the use of Dirichlet processes for adaptive clustering of the latent Gamma-scalars, each of which may then divide a group of latent Gaussian variables. The resulting Dirichlet $t$ -distribution interpolates naturally between the two extreme cases of the classical and alternative $t$ -distributions and combines more appealing modeling of the multivariate dependence structure with favorable computational properties. This paper was invited by the Editor-in-Chief of Bayesian Analysis to be presented as the 2014 Best Bayesian Analysis Paper at the Twelfth World Meeting of the International Society for Bayesian Analysis (ISBA2014), held in Cancun, Mexico, on July 14–18, 2014, with invited discussions by Babak Shahbaba and François Caron.
Source: Bayesian Analysis, Volume 9, Number 3, 551--556.
Source: Bayesian Analysis, Volume 9, Number 3, 557--560.
Scale mixtures of normals have been discussed extensively in the literature as heavy-tailed alternatives to the normal distribution for robust modeling. They have been used either as error models to handle outliers or as prior distributions to provide more reasonable shrinkage of model parameters. The proposed method by Finegold and Drton goes beyond the existing literature both in terms of application (graphical models) and methodology (Dirichlet $t$ ) for outlier handling. While this approach can be applied to many other problems, in this discussion I will focus on its application in Bayesian modeling of high throughput biological data.
Source: Bayesian Analysis, Volume 9, Number 3, 561--590.
Source: Bayesian Analysis, Volume 9, Number 3, 591--596.
Source: Bayesian Analysis, Volume 9, Number 3, 597--612.
Eliciting information from experts for use in constructing prior distributions for logistic regression coefficients can be challenging. The task is especially difficult when the model contains many predictor variables, because the expert is asked to provide summary information about the probability of “success” for many subgroups of the population. Often, however, experts are confident only in their assessment of the population as a whole. This paper is about incorporating such overall information easily into a logistic regression data analysis using $g$ -priors. We present a version of the $g$ -prior such that the prior distribution on the overall population logistic regression probabilities of success can be set to match a beta distribution. A simple data augmentation formulation allows implementation in standard statistical software packages.
Source: Bayesian Analysis, Volume 9, Number 3, 613--658.
Clustering is an important and challenging statistical problem for which there is an extensive literature. Modeling approaches include mixture models and product partition models. Here we develop a product partition model and a Bayesian model selection procedure based on Bayes factors from intrinsic priors. We also find that the choice of the prior on model space is of utmost importance, almost overshadowing the other parts of the clustering problem, and we examine the behavior of the model posterior probabilities based on different model space priors. We find, somewhat surprisingly, that procedures based on the often-used uniform prior (in which all models are given the same prior probability) lead to inconsistent model selection procedures. We examine other priors, and find that the Ewens-Pitman prior and a new prior, the hierarchical uniform prior , lead to consistent model selection procedures and have other desirable properties. Lastly, we compare the procedures on a range of examples.
Source: Bayesian Analysis, Volume 9, Number 3, 659--684.
We consider the behavior of Bayesian procedures that perform model selection for decomposable Gaussian graphical models when the true model is in fact non-decomposable. We examine the asymptotic behavior of the posterior when models are misspecified in this way, and find that the posterior will converge to graphical structures that are minimal triangulations of the true structure. The marginal log likelihood ratio comparing different minimal triangulations is stochastically bounded, and appears to remain data dependent regardless of the sample size. The covariance matrices corresponding to the different minimal triangulations are essentially equivalent, so model averaging is of minimal benefit. Using simulated data sets and a particular high performing Bayesian method for fitting decomposable models, feature inclusion stochastic search, we illustrate that these predictions are borne out in practice. Finally, a comparison is made to penalized likelihood methods for graphical models, which make no decomposability restriction. Despite its inability to fit the true model, feature inclusion stochastic search produces models that are competitive or superior to the penalized likelihood methods, especially at higher dimensions.
Source: Bayesian Analysis, Volume 9, Number 3, 685--698.
Bayesian decision theory is profoundly personalistic. It prescribes the decision $d$ that minimizes the expectation of the decision-maker’s loss function $L(d,\theta)$ with respect to that person’s opinion $\pi(\theta)$ . Attempts to extend this paradigm to more than one decision-maker have generally been unsuccessful, as shown in Part A of this paper. Part B of this paper explores a different decision set-up, in which Bayesians make choices knowing that later Bayesians will make decisions that matter to the earlier Bayesians. We explore conditions under which they together can be modeled as a single Bayesian. There are three reasons for doing so: 1. To understand the common structure of various examples, in some of which the reduction to a single Bayesian is possible, and in some of which it is not. In particular, it helps to deepen our understanding of the desirability of randomization to Bayesians. 2. As a possible computational simplification. When such reduction is possible, standard expected loss minimization software can be used to find optimal actions. 3. As a start toward a better understanding of social decision-making.
Spatial Bayesian Variable Selection Models on Functional Magnetic Resonance Imaging Time-Series Data
Source: Bayesian Analysis, Volume 9, Number 3, 699--732.
A common objective of fMRI (functional magnetic resonance imaging) studies is to determine subject-specific areas of increased blood oxygenation level dependent (BOLD) signal contrast in response to a stimulus or task, and hence to infer regional neuronal activity. We posit and investigate a Bayesian approach that incorporates spatial and temporal dependence and allows for the task-related change in the BOLD signal to change dynamically over the scanning session. In this way, our model accounts for potential learning effects in addition to other mechanisms of temporal drift in task-related signals. We study the properties of the model through its performance on simulated and real data sets.
Source: Bayesian Analysis, Volume 9, Number 3, 733--758.
We propose a multiscale model for Gaussian noised images under a Bayesian framework for both 2-dimensional (2D) and 3-dimensional (3D) images. We use a Chinese restaurant process prior to randomly generate ties among intensity values at neighboring pixels in the image. The resulting Bayesian estimator enjoys some desirable asymptotic properties for identifying precise structures in the image. The proposed Bayesian denoising procedure is completely data-driven. A conditional conjugacy property allows analytical computation of the posterior distribution without involving Markov chain Monte Carlo (MCMC) methods, making the method computationally efficient. Simulations on Shepp-Logan phantom and Lena test images confirm that our smoothing method is comparable with the best available methods for light noise and outperforms them for heavier noise both visually and numerically. The proposed method is further extended for 3D images. A simulation study shows that the proposed method is numerically better than most existing denoising approaches for 3D images. A 3D Shepp-Logan phantom image is used to demonstrate the visual and numerical performance of the proposed method, along with the computational time. MATLAB toolboxes are made available online (both 2D and 3D) to implement the proposed method and reproduce the numerical results.
Michael Finegold, Mathias Drton. Robust Bayesian Graphical Modeling Using Dirichlet $t$ -Distributions. 521--550.
François Caron, Luke Bornn. Comment on Article by Finegold and Drton. 551--556.
Babak Shahbaba. Comment on Article by Finegold and Drton. 557--560.
Various authors. Contributed Discussion on Article by Finegold and Drton. 561--590.
Michael Finegold, Mathias Drton. Rejoinder. 591--596.
Timothy E. Hanson, Adam J. Branscum, Wesley O. Johnson. Informative $g$ -Priors for Logistic Regression. 597--612.
George Casella, Elías Moreno, F. Javier Girón. Cluster Analysis, Model Selection, and Prior Distributions on Models. 613--658.
A. Marie Fitch, M. Beatrix Jones, Hélène Massam. The Performance of Covariance Selection Methods That Consider Decomposable Models Only. 659--684.
Joseph B. Kadane, Steven N. MacEachern. Toward Rational Social Decisions: A Review and Some Results. 685--698.
Kuo-Jung Lee, Galin L. Jones, Brian S. Caffo, Susan S. Bassett. Spatial Bayesian Variable Selection Models on Functional Magnetic Resonance Imaging Time-Series Data. 699--732.
Meng Li, Subhashis Ghosal. Bayesian Multiscale Smoothing of Gaussian Noised Images. 733--758.
The September issue of [JRSS] Series B I received a few days ago is of particular interest to me. (And not as an ex-co-editor since I was never involved in any of those papers!) To wit: a paper by Hani Doss and Aixin Tan on evaluating normalising constants based on MCMC output, a preliminary version I had seen at a previous JSM meeting, a paper by Nick Polson, James Scott and Jesse Windle on the Bayesian bridge, connected with Nick’s talk in Boston earlier this month, yet another paper by Ariel Kleiner, Ameet Talwalkar, Purnamrita Sarkar and Michael Jordan on the bag of little bootstraps, which presentation I heard Michael deliver a few times when he was in Paris. (Obviously, this does not imply any negative judgement on the other papers of this issue!)
For instance, Doss and Tan consider the multiple mixture estimator [my wording, the authors do not give the method a name, referring to Vardi (1985) but missing the connection with Owen and Zhou (2000)] of k ratios of normalising constants, namely
where the z’s are the normalising constants and with possible different numbers of iterations of each Markov chain. An interesting starting point (that Hans Künsch had mentioned to me a while ago but that I had since then forgotten) is that the problem was reformulated by Charlie Geyer (1994) as a quasi-likelihood estimation where the ratios of all z’s relative to one reference density are the unknowns. This is doubling interesting, actually, because it restates the constant estimation problem into a statistical light and thus somewhat relates to the infamous “paradox” raised by Larry Wasserman a while ago. The novelty in the paper is (a) to derive an optimal estimator of the ratios of normalising constants in the Markov case, essentially accounting for possibly different lengths of the Markov chains, and (b) to estimate the variance matrix of the ratio estimate by regeneration arguments. A favourite tool of mine, at least theoretically as practically useful minorising conditions are hard to come by, if at all available.
Filed under: Books, Statistics, Travel, University life Tagged: bag of little bootstraps, Bayesian bridge, Bayesian lasso, JRSSB, marginal likelihood, Markov chain Monte Carlo, normalising constant, Series B, simulation, untractable normalizing constant, Wasserman's paradox
Yet another workshop around! Still at Warwick, organised by Simon Barthelmé, Nicolas Chopin and Adam Johansen on the theme of statistical aspects of neuroscience. Being nearby I attended a few lectures today but most talks are more topical than my current interest in the matter, plus workshop fatigue starts to appear!, and hence I will keep a low attendance for the rest of the week to take advantage of my visit here to make some progress in my research and in the preparation of the teaching semester. (Maybe paradoxically I attended a non-neuroscience talk by listening to Richard Wilkinson’s coverage of ABC methods, with an interesting stress on meta-models and the link with computer experiments. Given that we are currently re-revising our paper with Matt Moore and Kerrie Mengersen (and now Chris Drovandi), I find interesting to see a sort of convergence in our community towards a re-re-interpretation of ABC as producing an approximation of the distribution of the summary statistic itself, rather than of the original data, using auxiliary or indirect or pseudo-models like Gaussian processes. (Making the link with Mark Girolami’s talk this morning.)
Filed under: Books, pictures, Statistics, Travel Tagged: ABC, computer experiment model, Gaussian processes, indirect inference, neurosciences, University of Warwick, workshop
Filed under: pictures, Travel, University life Tagged: England, heron, mathematics, Statistics, summer, University of Warwick