Bayesian News Feeds

Sherlock [#3]

Xian's Og - Fri, 2015-03-13 19:15

After watching the first two seasons of the BBC TV Series Sherlock while at the hospital, I found myself looking forward further adventures of Holmes and Watson and eventually “bought” the third season. And watched it over the past weekends. I liked it very much as this new season distanced itself from the sheer depiction of Sherlock’s amazing powers to a quite ironic and self-parodic story, well in tune with a third season where the audience is now utterly familiar with the main characters. They all put on weight (mostly figuratively!), from Sherlock’s acknowledgement of his psychological shortcomings, to Mrs. Hudson’s revealing her drug trafficking past and expressing her dislike of Mycroft, to  John Watson’s engagement and acceptance of Sherlock’s idiosyncrasies, making him the central character of the series in a sort of fatherly figure. Some new characters are also terrific, including Mary Morstan and the new archvillain, C.A. Magnussen. Paradoxically, this makes the detective part of the stories secondary, which is all for the best as, in my opinion, the plots are rather weak and the resolutions hardly relying on high intellectual powers, albeit always surprising. More sleuthing in the new season would be most welcome! As an aside, the wedding place sounded somewhat familiar to me, until I realised it was Goldney Hall, where the recent workshops I attended in Bristol took place.


Filed under: Books Tagged: amazon associates, BBC, Bristol, Conan Doyle, Goldney Hall, Sherlock Holmes, TV series
Categories: Bayesian Bloggers

Hamiltonian ABC

Xian's Og - Thu, 2015-03-12 19:15

On Monday, Ed Meeds, Robert Leenders, and Max Welling (from Amsterdam) arXived a paper entitled Hamiltonian ABC. Before looking at the paper in any detail, I got puzzled by this association of antagonistic terms, since ABC is intended for complex and mostly intractable likelihoods, while Hamiltonian Monte Carlo requires a lot from the target, in order to compute gradients and Hessians… [Warning: some graphs on pages 13-14 may be harmful to your printer!]

Somewhat obviously (ex-post!), the paper suggests to use Hamiltonian dynamics on ABC approximations of the likelihood. They compare a Gaussian kernel version

with the synthetic Gaussian likelihood version of Wood (2010)

where both mean and variance are estimated from the simulated data. If ε is taken as an external quantity and driven to zero, the second approach is much more stable. But… ε is never driven to zero in ABC, or fixed at ε=0.37: It is instead considered as a kernel bandwidth and hence estimated from the simulated data. Hence ε is commensurable with σ(θ).  And this makes me wonder at the relevance of the conclusion that synthetic is better than kernel for Hamiltonian ABC. More globally, I wonder at the relevance of better simulating from a still approximate target when the true goal is to better approximate the genuine posterior.

Some of the paper covers separate issues like handling gradient by finite differences à la Spall [if you can afford it!] and incorporating the random generator as part of the Markov chain. And using S common random numbers in computing the gradients for all values of θ. (Although I am not certain all random generators can be represented as a deterministic transform of a parameter θ and of a fixed number of random uniforms. But the authors may consider a random number of random uniforms when they represent their random generators as deterministic transform of a parameter θ and of the random seed. I am also uncertain about the distinction between common, sticky, and persistent random numbers!)


Filed under: Books, pictures, Statistics, University life Tagged: ABC, Amsterdam, Hamiltonian Monte Carlo, Markov chain, Monte Carlo Statistical Methods, pseudo-random generator, random seed, synthetic likelihood
Categories: Bayesian Bloggers

eliminating an important obstacle to creative thinking: statistics…

Xian's Og - Wed, 2015-03-11 19:15

“We hope and anticipate that banning the NHSTP will have the effect of increasing the quality of submitted manuscripts by liberating authors from the stultified structure of NHSTP thinking thereby eliminating an important obstacle to creative thinking.”

About a month ago, David Trafimow and Michael Marks, the current editors of the journal Basic and Applied Social Psychology published an editorial banning all null hypothesis significance testing procedures (acronym-ed into the ugly NHSTP which sounds like a particularly nasty venereal disease!) from papers published by the journal. My first reaction was “Great! This will bring more substance to the papers by preventing significance fishing and undisclosed multiple testing! Power to the statisticians!” However, after reading the said editorial, I realised it was inspired by a nihilistic anti-statistical stance, backed by an apparent lack of understanding of the nature of statistical inference, rather than a call for saner and safer statistical practice. The editors most clearly state that inferential statistical procedures are no longer needed to publish in the journal, only “strong descriptive statistics”. Maybe to keep in tune with the “Basic” in the name of the journal!

“In the NHSTP, the problem is in traversing the distance from the probability of the finding, given the null hypothesis, to the probability of the null hypothesis, given the finding. Regarding confidence intervals, the problem is that, for example, a 95% confidence interval does not indicate that the parameter of interest has a 95% probability of being within the interval.”

The above quote could be a motivation for a Bayesian approach to the testing problem, a revolutionary stance for journal editors!, but it only illustrate that the editors wish for a procedure that would eliminate the uncertainty inherent to statistical inference, i.e., to decision making under… erm, uncertainty: “The state of the art remains uncertain.” To fail to separate significance from certainty is fairly appalling from an epistemological perspective and should be a case for impeachment, were any such thing to exist for a journal board. This means the editors cannot distinguish data from parameter and model from reality! Even more fundamentally, to bar statistical procedures from being used in a scientific study is nothing short of reactionary. While encouraging the inclusion of data is a step forward, restricting the validation or in-validation of hypotheses to gazing at descriptive statistics is many steps backward and does completely jeopardize the academic reputation of the journal, which editorial may end up being the last quoted paper. Is deconstruction now reaching psychology journals?! To quote from a critic of this approach, “Thus, the general weaknesses of the deconstructive enterprise become self-justifying. With such an approach I am indeed not sympathetic.” (Searle, 1983).

“The usual problem with Bayesian procedures is that they depend on some sort of Laplacian assumption to generate numbers where none exist (…) With respect to Bayesian procedures, we reserve the right to make case-by-case judgments, and thus Bayesian procedures are neither required nor banned from BASP.”

The section of Bayesian approaches is trying to be sympathetic to the Bayesian paradigm but again reflects upon the poor understanding of the authors. By “Laplacian assumption”, they mean Laplace´s Principle of Indifference, i.e., the use of uniform priors, which is not seriously considered as a sound principle since the mid-1930’s. Except maybe in recent papers of Trafimow. I also love the notion of “generat[ing] numbers when none exist”, as if the prior distribution had to be grounded in some physical reality! Although it is meaningless, it has some poetic value… (Plus, bringing Popper and Fisher to the rescue sounds like shooting Bayes himself in the foot.)  At least, the fact that the editors will consider Bayesian papers in a case-by-case basis indicate they may engage in a subjective Bayesian analysis of each paper rather than using an automated p-value against the 100% rejection bound!

[Note: this entry was suggested by Alexandra Schmidt, current ISBA President, towards an incoming column on this decision of Basic and Applied Social Psychology for the ISBA Bulletin.]

 


Filed under: Books, Kids, Statistics, University life Tagged: Basic and Applied Social Psychology, Bayesian hypothesis testing, confidence intervals, editor, ISBA, ISBA Bulletin, Karl Popper, NHSTP, null hypothesis, p-values, Pierre Simon de Laplace, Principle of Indifference, Thomas Bayes, xkcd
Categories: Bayesian Bloggers

Edmond Malinvaud (1923-2015)

Xian's Og - Tue, 2015-03-10 19:15

The statistician, econometrician, macro- and micro-economist, Edmond Malinvaud died on Saturday, March 7. He had been director of my alma mater ENSAE (1962–1966), directeur de la Prévision at the Finance Department (1972–1974), director of INSEE (1974–1987), and Professeur at Collège de France (1988–1993). While primarily an economist, with his theories of disequilibrium and unemployment, reflected in his famous book Théorie macro-économique (1981) that he taught us at ENSAE, he was also instrumental in shaping the French econometrics school, see his equally famous Statistical Methods of Econometrics (1970), and in the reorganisation of INSEE as the post-war State census and economic planning tool. He was also an honorary Fellow of the Royal Statistical Society and the 1981 president of the International Institute of Statistics. Edmond Malinvaud studied under Maurice Allais, Nobel Prize in economics in 1988, and was himself considered as a potential Nobel for several years. My personal memories of him at ENSAE and CREST are of a very clear teacher and of a kind and considerate man, with the reserve and style of a now-bygone era…


Filed under: Books, Kids, Statistics, University life Tagged: Collège de France, CREST, disequilibrium, econometrics, Edmond Malinvaud, ENSAE, INSEE, macroeconomics, Maurice Allais
Categories: Bayesian Bloggers

ABC of simulation estimation with auxiliary statistics

Xian's Og - Mon, 2015-03-09 19:15

“In the ABC literature, an estimator that uses a general kernel is known as a noisy ABC estimator.”

Another arXival relating M-estimation econometrics techniques with ABC. Written by Jean-Jacques Forneron and Serena Ng from the Department of Economics at Columbia University, the paper tries to draw links between indirect inference and ABC, following the tracks of Drovandi and Pettitt [not quoted there] and proposes a reverse ABC sampler by

  1. given a randomness realisation, ε, creating a one-to-one transform of the parameter θ that corresponds to a realisation of a summary statistics;
  2. determine the value of the parameter θ that minimises the distance between this summary statistics and the observed summary statistics;
  3. weight the above value of the parameter θ by π(θ) J(θ) where J is the Jacobian of the one-to-one transform.

I have difficulties to see why this sequence produces a weighted sample associated with the posterior. Unless perhaps when the minimum of the distance is zero, in which case this amounts to some inversion of the summary statistic (function). And even then, the role of the random bit  ε is unclear. Since there is no rejection. The inversion of the summary statistics seems hard to promote in practice since the transform of the parameter θ into a (random) summary is most likely highly complex.

“The posterior mean of θ constructed from the reverse sampler is the same as the posterior mean of θ computed under the original ABC sampler.”

The authors also state (p.16) that the estimators derived by their reverse method are the same as the original ABC approach but this only happens to hold asymptotically in the sample size. And I am not even sure of this weaker statement as the tolerance does not seem to play a role then. And also because the authors later oppose ABC to their reverse sampler as the latter produces iid draws from the posterior (p.25).

“The prior can be potentially used to further reduce bias, which is a feature of the ABC.”

As an aside, while the paper reviews extensively the literature on minimum distance estimators (called M-estimators in the statistics literature) and on ABC, the first quote is missing the meaning of noisy ABC, which consists in a randomised version of ABC where the observed summary statistic is randomised at the same level as the simulated statistics. And the last quote does not sound right either, as it should be seen as a feature of the Bayesian approach rather than of the ABC algorithm. The paper also attributes the paternity of ABC to Don Rubin’s 1984 paper, “who suggested that computational methods can be used to estimate the posterior distribution of interest even when a model is analytically intractable” (pp.7-8). This is incorrect in that Rubin uses ABC to explain the nature of the Bayesian reasoning, but does not in the least address computational issues.


Filed under: Statistics, University life Tagged: ABC, Columbia University, consistency, indirect inference, noisy ABC
Categories: Bayesian Bloggers

Professor position at ENSAE, on the Paris Saclay campus

Xian's Og - Mon, 2015-03-09 08:25

There is an opening at the Statistics School ENSAE for a Statistics associate or full professor position, starting on September 2015. Currently located on the South-West boundary of Paris, the school is soon to move to the mega-campus of Paris Saclay, near École Polytechnique, along with a dozen other schools. See this description of the position. The deadline is very close, March 23!


Filed under: Statistics Tagged: academic position, École Polytechnique, CREST, ENSAE, France, INSEE, Malakoff, Paris, Paris-Saclay campus
Categories: Bayesian Bloggers

mixtures of mixtures

Xian's Og - Sun, 2015-03-08 19:15

And yet another arXival of a paper on mixtures! This one is written by Gertraud Malsiner-Walli, Sylvia Frühwirth-Schnatter, and Bettina Grün, from the Johannes Kepler University Linz and the Wirtschaftsuniversitat Wien I visited last September. With the exact title being Identifying mixtures of mixtures using Bayesian estimation.

So, what is a mixture of mixtures if not a mixture?! Or if not only a mixture. The upper mixture level is associated with clusters, while the lower mixture level is used for modelling the distribution of a given cluster. Because the cluster needs to be real enough, the components of the mixture are assumed to be heavily overlapping. The paper thus spends a large amount of space on detailing the construction of the associated hierarchical prior. Which in particular implies defining through the prior what a cluster means. The paper also connects with the overfitting mixture idea of Rousseau and Mengersen (2011, Series B). At the cluster level, the Dirichlet hyperparameter is chosen to be very small, 0.001, which empties superfluous clusters but sounds rather arbitrary (which is the reason why we did not go for such small values in our testing/mixture modelling). On the opposite, the mixture weights have an hyperparameter staying (far) away from zero. The MCMC implementation is based on a standard Gibbs sampler and the outcome is analysed and sorted by estimating the “true” number of clusters as the MAP and by selecting MCMC simulations conditional on that value. From there clusters are identified via the point process representation of a mixture posterior. Using a standard k-means algorithm.

The remainder of the paper illustrates the approach on simulated and real datasets. Recovering in those small dimension setups the number of clusters used in the simulation or found in other studies. As noted in the conclusion, using solely a Gibbs sampler with such a large number of components is rather perilous since it may get stuck close to suboptimal configurations. Especially with very small Dirichlet hyperparameters.


Filed under: pictures, Statistics, University life Tagged: arXiv, Austria, clustering, k-mean clustering algorithm, Linkz, map, MCMC, mixture, overfitting, Wien
Categories: Bayesian Bloggers

Assyrian art

Xian's Og - Sun, 2015-03-08 08:36
Categories: Bayesian Bloggers

Le Monde puzzle [#902]

Xian's Og - Sat, 2015-03-07 19:15

Another arithmetics Le Monde mathematical puzzle:

From the set of the integers between 1 and 15, is it possible to partition it in such a way that the product of the terms in the first set is equal to the sum of the members of the second set? can this be generalised to an arbitrary set {1,2,..,n}? What happens if instead we only consider the odd integers in those sets?.

I used brute force by looking at random for a solution,

pb <- txtProgressBar(min = 0, max = 100, style = 3) for (N in 5:100){ sol=FALSE while (!sol){ k=sample(1:N,1,prob=(1:N)*(N-(1:N))) pro=sample(1:N,k) sol=(prod(pro)==sum((1:N)[-pro])) } setTxtProgressBar(pb, N)} close(pb)

and while it took a while to run the R code, it eventually got out of the loop, meaning there was at least one solution for all n’s between 5 and 100. (It does not work for n=1,2,3,4, for obvious reasons.) For instance, when n=15, the integers in the product part are either 3,5,7, 1,7,14, or 1,9,11. Jean-Louis Fouley sent me an explanation:  when n is odd, n=2p+1, one solution is (1,p,2p), while when n is even, n=2p, one solution is (1,p-1,2p).

A side remark on the R code: thanks to a Cross Validated question by Paulo Marques, on which I thought I had commented on this blog, I learned about the progress bar function in R, setTxtProgressBar(), which makes running R code with loops much nicer!

For the second question, I just adapted the R code to exclude even integers:

while (!sol){ k=1+trunc(sample(1:N,1)/2) pro=sample(seq(1,N,by=2),k) cum=(1:N)[-pro] sol=(prod(pro)==sum(cum[cum%%2==1])) }

and found a solution for n=15, namely 1,3,15 versus 5,7,9,11,13. However, there does not seem to be a solution for all n’s: I found solutions for n=15,21,23,31,39,41,47,49,55,59,63,71,75,79,87,95…


Filed under: Books, Kids, Statistics, University life Tagged: Chib's approximation, Le Monde, mathematical puzzle, mixture estimation, progress bar, R, txtProgressBar
Categories: Bayesian Bloggers

Domaine de Mortiès [in the New York Times]

Xian's Og - Fri, 2015-03-06 19:15

“I’m not sure how we found Domaine de Mortiès, an organic winery at the foothills of Pic St. Loup, but it was the kind of unplanned, delightful discovery our previous trips to Montpellier never allowed.”

Last year,  I had the opportunity to visit and sample (!) from Domaine de Mortiès, an organic Pic Saint-Loup vineyard and winemaker. I have not yet opened the bottle of Jamais Content I bought then. Today I spotted in The New York Times a travel article on A visit to the in-laws in Montpellier that takes the author to Domaine de Mortiès, Pic Saint-Loup, Saint-Guilhem-du-Désert and other nice places, away from the overcrowded centre of town and the rather bland beach-town of Carnon, where she usually stays when visiting. And where we almost finished our Bayesian Essentials with R! To quote from the article, “Montpellier, France’s eighth-largest city, is blessed with a Mediterranean sun and a beautiful, walkable historic centre, a tourist destination in its own right, but because it is my husband’s home city, a trip there never felt like a vacation to me.” And when the author mentions the owner of Domaine de Mortiès, she states that “Mme. Moustiés looked about as enthused as a teenager working the checkout at Rite Aid”, which is not how I remember her from last year. Anyway, it is fun to see that visitors from New York City can unexpectedly come upon this excellent vineyard!


Filed under: Mountains, Travel, Wines Tagged: carignan, Carnon, Domaine Mortiès, French wines, grenache, Languedoc wines, Méditerranée, Montpellier, mourvèdre, New York city, Pic Saint Loup, Syrah, The New York Times, vineyard
Categories: Bayesian Bloggers

mixture models with a prior on the number of components

Xian's Og - Thu, 2015-03-05 19:15

“From a Bayesian perspective, perhaps the most natural approach is to treat the numberof components like any other unknown parameter and put a prior on it.”

Another mixture paper on arXiv! Indeed, Jeffrey Miller and Matthew Harrison recently arXived a paper on estimating the number of components in a mixture model, comparing the parametric with the non-parametric Dirichlet prior approaches. Since priors can be chosen towards agreement between those. This is an obviously interesting issue, as they are often opposed in modelling debates. The above graph shows a crystal clear agreement between finite component mixture modelling and Dirichlet process modelling. The same happens for classification.  However, Dirichlet process priors do not return an estimate of the number of components, which may be considered a drawback if one considers this is an identifiable quantity in a mixture model… But the paper stresses that the number of estimated clusters under the Dirichlet process modelling tends to be larger than the number of components in the finite case. Hence that the Dirichlet process mixture modelling is not consistent in that respect, producing parasite extra clusters…

In the parametric modelling, the authors assume the same scale is used in all Dirichlet priors, that is, for all values of k, the number of components. Which means an incoherence when marginalising from k to (k-p) components. Mild incoherence, in fact, as the parameters of the different models do not have to share the same priors. And, as shown by Proposition 3.3 in the paper, this does not prevent coherence in the marginal distribution of the latent variables. The authors also draw a comparison between the distribution of the partition in the finite mixture case and the Chinese restaurant process associated with the partition in the infinite case. A further analogy is that the finite case allows for a stick breaking representation. A noteworthy difference between both modellings is about the size of the partitions

in the finite (homogeneous partitions) and infinite (extreme partitions) cases.

An interesting entry into the connections between “regular” mixture modelling and Dirichlet mixture models. Maybe not ultimately surprising given the past studies by Peter Green and Sylvia Richardson of both approaches (1997 in Series B and 2001 in JASA).


Filed under: Books, Statistics, University life Tagged: Bayesian asymptotics, Bayesian non-parametrics, Chinese restaurant process, consistency, Dirichlet mixture priors, Dirichlet process, mixtures, reversible jump
Categories: Bayesian Bloggers

Compound Poisson Processes, Latent Shrinkage Priors and Bayesian Nonconvex Penalization

Zhihua Zhang, Jin Li.

Source: Bayesian Analysis, Volume 10, Number 2, 247--274.

Abstract:
In this paper we discuss Bayesian nonconvex penalization for sparse learning problems. We explore a nonparametric formulation for latent shrinkage parameters using subordinators which are one-dimensional Lévy processes. We particularly study a family of continuous compound Poisson subordinators and a family of discrete compound Poisson subordinators. We exemplify four specific subordinators: Gamma, Poisson, negative binomial and squared Bessel subordinators. The Laplace exponents of the subordinators are Bernstein functions, so they can be used as sparsity-inducing nonconvex penalty functions. We exploit these subordinators in regression problems, yielding a hierarchical model with multiple regularization parameters. We devise ECME (Expectation/Conditional Maximization Either) algorithms to simultaneously estimate regression coefficients and regularization parameters. The empirical evaluation of simulated data shows that our approach is feasible and effective in high-dimensional data analysis.

Categories: Bayesian Analysis

Dirichlet Process Hidden Markov Multiple Change-point Model

Stanley I. M. Ko, Terence T. L. Chong, Pulak Ghosh.

Source: Bayesian Analysis, Volume 10, Number 2, 275--296.

Abstract:
This paper proposes a new Bayesian multiple change-point model which is based on the hidden Markov approach. The Dirichlet process hidden Markov model does not require the specification of the number of change-points a priori . Hence our model is robust to model specification in contrast to the fully parametric Bayesian model. We propose a general Markov chain Monte Carlo algorithm which only needs to sample the states around change-points. Simulations for a normal mean-shift model with known and unknown variance demonstrate advantages of our approach. Two applications, namely the coal-mining disaster data and the real United States Gross Domestic Product growth, are provided. We detect a single change-point for both the disaster data and US GDP growth. All the change-point locations and posterior inferences of the two applications are in line with existing methods.

Categories: Bayesian Analysis

Two-sample Bayesian Nonparametric Hypothesis Testing

Chris C. Holmes, François Caron, Jim E. Griffin, David A. Stephens.

Source: Bayesian Analysis, Volume 10, Number 2, 297--320.

Abstract:
In this article we describe Bayesian nonparametric procedures for two-sample hypothesis testing. Namely, given two sets of samples $\mathbf{y}^{\scriptscriptstyle(1)}\stackrel{\scriptscriptstyle{\text{iid}}}{\sim}F^{\scriptscriptstyle(1)}$ and $\mathbf{y}^{\scriptscriptstyle(2)}\stackrel{\scriptscriptstyle{\text{iid}}}{\sim}F^{\scriptscriptstyle(2)}$ , with $F^{\scriptscriptstyle(1)},F^{\scriptscriptstyle(2)}$ unknown, we wish to evaluate the evidence for the null hypothesis $H_{0}:F^{\scriptscriptstyle(1)}\equiv F^{\scriptscriptstyle(2)}$ versus the alternative $H_{1}:F^{\scriptscriptstyle(1)}\neq F^{\scriptscriptstyle(2)}$ . Our method is based upon a nonparametric Pólya tree prior centered either subjectively or using an empirical procedure. We show that the Pólya tree prior leads to an analytic expression for the marginal likelihood under the two hypotheses and hence an explicit measure of the probability of the null $\mathrm{Pr}(H_{0}|\{\mathbf{y}^{\scriptscriptstyle(1)},\mathbf{y}^{\scriptscriptstyle(2)}\}\mathbf{)}$

Categories: Bayesian Analysis

Sensitivity Analysis for Bayesian Hierarchical Models

Małgorzata Roos, Thiago G. Martins, Leonhard Held, Håvard Rue.

Source: Bayesian Analysis, Volume 10, Number 2, 321--349.

Abstract:
Prior sensitivity examination plays an important role in applied Bayesian analyses. This is especially true for Bayesian hierarchical models, where interpretability of the parameters within deeper layers in the hierarchy becomes challenging. In addition, lack of information together with identifiability issues may imply that the prior distributions for such models have an undesired influence on the posterior inference. Despite its importance, informal approaches to prior sensitivity analysis are currently used. They require repetitive re-fits of the model with ad-hoc modified base prior parameter values. Other formal approaches to prior sensitivity analysis suffer from a lack of popularity in practice, mainly due to their high computational cost and absence of software implementation. We propose a novel formal approach to prior sensitivity analysis, which is fast and accurate. It quantifies sensitivity without the need for a model re-fit. Through a series of examples we show how our approach can be used to detect high prior sensitivities of some parameters as well as identifiability issues in possibly over-parametrized Bayesian hierarchical models.

Categories: Bayesian Analysis

Scaling It Up: Stochastic Search Structure Learning in Graphical Models

Hao Wang.

Source: Bayesian Analysis, Volume 10, Number 2, 351--377.

Abstract:
Gaussian concentration graph models and covariance graph models are two classes of graphical models that are useful for uncovering latent dependence structures among multivariate variables. In the Bayesian literature, graphs are often determined through the use of priors over the space of positive definite matrices with fixed zeros, but these methods present daunting computational burdens in large problems. Motivated by the superior computational efficiency of continuous shrinkage priors for regression analysis, we propose a new framework for structure learning that is based on continuous spike and slab priors and uses latent variables to identify graphs. We discuss model specification, computation, and inference for both concentration and covariance graph models. The new approach produces reliable estimates of graphs and efficiently handles problems with hundreds of variables.

Categories: Bayesian Analysis

Predictions Based on the Clustering of Heterogeneous Functions via Shape and Subject-Specific Covariates

Garritt L. Page, Fernando A. Quintana.

Source: Bayesian Analysis, Volume 10, Number 2, 379--410.

Abstract:
We consider a study of players employed by teams who are members of the National Basketball Association where units of observation are functional curves that are realizations of production measurements taken through the course of one’s career. The observed functional output displays large amounts of between player heterogeneity in the sense that some individuals produce curves that are fairly smooth while others are (much) more erratic. We argue that this variability in curve shape is a feature that can be exploited to guide decision making, learn about processes under study and improve prediction. In this paper we develop a methodology that takes advantage of this feature when clustering functional curves. Individual curves are flexibly modeled using Bayesian penalized B-splines while a hierarchical structure allows the clustering to be guided by the smoothness of individual curves. In a sense, the hierarchical structure balances the desire to fit individual curves well while still producing meaningful clusters that are used to guide prediction. We seamlessly incorporate available covariate information to guide the clustering of curves non-parametrically through the use of a product partition model prior for a random partition of individuals. Clustering based on curve smoothness and subject-specific covariate information is particularly important in carrying out the two types of predictions that are of interest, those that complete a partially observed curve from an active player, and those that predict the entire career curve for a player yet to play in the National Basketball Association.

Categories: Bayesian Analysis

Approximate Bayesian Computation by Modelling Summary Statistics in a Quasi-likelihood Framework

Stefano Cabras, Maria Eugenia Castellanos Nueda, Erlis Ruli.

Source: Bayesian Analysis, Volume 10, Number 2, 411--439.

Abstract:
Approximate Bayesian Computation (ABC) is a useful class of methods for Bayesian inference when the likelihood function is computationally intractable. In practice, the basic ABC algorithm may be inefficient in the presence of discrepancy between prior and posterior. Therefore, more elaborate methods, such as ABC with the Markov chain Monte Carlo algorithm (ABC-MCMC), should be used. However, the elaboration of a proposal density for MCMC is a sensitive issue and very difficult in the ABC setting, where the likelihood is intractable. We discuss an automatic proposal distribution useful for ABC-MCMC algorithms. This proposal is inspired by the theory of quasi-likelihood (QL) functions and is obtained by modelling the distribution of the summary statistics as a function of the parameters. Essentially, given a real-valued vector of summary statistics, we reparametrize the model by means of a regression function of the statistics on parameters, obtained by sampling from the original model in a pilot-run simulation study. The QL theory is well established for a scalar parameter, and it is shown that when the conditional variance of the summary statistic is assumed constant, the QL has a closed-form normal density. This idea of constructing proposal distributions is extended to non constant variance and to real-valued parameter vectors. The method is illustrated by several examples and by an application to a real problem in population genetics.

Categories: Bayesian Analysis