## Bayesian News Feeds

### Ben Lawers, Perthshire

Filed under: Mountains, pictures, Running, Travel Tagged: An Stuc, Ben Lawers, munroes, Pertshire, Scotland. Highlands

### improved approximate-Bayesian model-choice method for estimating shared evolutionary history [reply from the author]

*[Here is a very kind and detailed reply from Jamie Oakes to the comments I made on his ABC paper a few days ago:]*

First of all, many thanks for your thorough review of my pre-print! It is very helpful and much appreciated. I just wanted to comment on a few things you address in your post.

I am a little confused about how my replacement of continuous uniform probability distributions with gamma distributions for priors on several parameters introduces a potentially crippling number of hyperparameters. Both uniform and gamma distributions have two parameters. So, the new model only has one additional hyperparameter compared to the original msBayes model: the concentration parameter on the Dirichlet process prior on divergence models. Also, the new model offers a uniform prior over divergence models (though I don’t recommend it).

Your comment about there being no new ABC technique is 100% correct. The model is new, the ABC numerical machinery is not. Also, your intuition is correct, I do not use the divergence times to calculate summary statistics. I mention the divergence times in the description of the ABC algorithm with the hope of making it clear that the times are scaled (see Equation (12)) prior to the simulation of the data (from which the summary statistics are calculated). This scaling is simply to go from units proportional to time, to units that are proportional to the expected number of mutations. Clearly, my attempt at clarity only created unnecessary opacity. I’ll have to make some edits.

Regarding the reshuffling of the summary statistics calculated from different alignments of sequences, the statistics are not exchangeable. So, reshuffling them in a manner that is not conistent across all simulations and the observed data is not mathematically valid. Also, if elements are exchangeable, their order will not affect the likelihood (or the posterior, barring sampling error). Thus, if our goal is to approximate the likelihood, I would hope the reshuffling would also have little affect on the approximate posterior (otherwise my approximation is not so good?).

You are correct that my use of “bias” was not well defined in reference to the identity line of my plots of the estimated vs true probability of the one-divergence model. I think we can agree that, ideally (all assumptions are met), the estimated posterior probability of a model should estimate the probability that the model is correct. For large numbers of simulation

replicates, the proportion of the replicates for which the one-divergence model is true will approximate the probability that the one-divergence model is correct. Thus, if the method has the desirable (albeit “frequentist”) behavior such that the estimated posterior probability of the one-divergence model is an unbiased estimate of the probability that the one-divergence model is correct, the points should fall near the identity line. For example, let us say the method estimates a posterior probability of 0.90 for the one-divergence model for 1000 simulated datasets. If the method is accurately estimating the probability that the one-divergence model is the correct model, then the one-divergence model should be the true model for approximately 900 of the 1000 datasets. Any trend away from the identity line indicates the method is biased in the (frequentist) sense that it is not correctly estimating the probability that the one-divergence model is the correct model. I agree this measure of “bias” is frequentist in nature. However, it seems like a worthwhile goal for Bayesian model-choice methods to have good frequentist properties. If a method strongly deviates from the identity line, it is much more difficult to interpret the posterior probabilites that it estimates. Going back to my example of the posterior probability of 0.90 for 1000 replicates, I would be alarmed if the model was true in only 100 of the replicates.

My apologies if my citation of your PNAS paper seemed misleading. The citation was intended to be limited to the context of ABC methods that use summary statistics that are insufficient across the models under comparison (like msBayes and the method I present in the paper). I will definitely expand on this sentence to make this clearer in revisions. Thanks!

Lastly, my concluding remarks in the paper about full-likelihood methods in this domain are not as lofty as you might think. The likelihood function of the msBayes model is tractable, and, in fact, has already been derived and implemented via reversible-jump MCMC (albeit, not readily available yet). Also, there are plenty of examples of rich, Kingman-coalescent models implemented in full-likelihood Bayesian frameworks. Too many to list, but a lot of them are implemented in the BEAST software package. One noteworthy example is the work of Bryant et al. (2012, Molecular Biology and Evolution, 29(8), 1917–32) that analytically integrates over all gene trees for biallelic markers under the coalescent.

Filed under: Books, Statistics, University life Tagged: ABC, Bayesian statistics, consistence, Dirichlet process, exchangeability, frequency properties, Kingman's coalescent, Molecular Biology and Evolution, Monte Carlo Statistical Methods, reversible jump, sufficiency, summary statistics, taxon

### improved approximate-Bayesian model-choice method for estimating shared evolutionary history [reply from the author]

*[Here is a very kind and detailed reply from Jamie Oakes to the comments I made on his ABC paper a few days ago:]*

First of all, many thanks for your thorough review of my pre-print! It is very helpful and much appreciated. I just wanted to comment on a few things you address in your post.

I am a little confused about how my replacement of continuous uniform probability distributions with gamma distributions for priors on several parameters introduces a potentially crippling number of hyperparameters. Both uniform and gamma distributions have two parameters. So, the new model only has one additional hyperparameter compared to the original msBayes model: the concentration parameter on the Dirichlet process prior on divergence models. Also, the new model offers a uniform prior over divergence models (though I don’t recommend it).

Your comment about there being no new ABC technique is 100% correct. The model is new, the ABC numerical machinery is not. Also, your intuition is correct, I do not use the divergence times to calculate summary statistics. I mention the divergence times in the description of the ABC algorithm with the hope of making it clear that the times are scaled (see Equation (12)) prior to the simulation of the data (from which the summary statistics are calculated). This scaling is simply to go from units proportional to time, to units that are proportional to the expected number of mutations. Clearly, my attempt at clarity only created unnecessary opacity. I’ll have to make some edits.

Regarding the reshuffling of the summary statistics calculated from different alignments of sequences, the statistics are not exchangeable. So, reshuffling them in a manner that is not conistent across all simulations and the observed data is not mathematically valid. Also, if elements are exchangeable, their order will not affect the likelihood (or the posterior, barring sampling error). Thus, if our goal is to approximate the likelihood, I would hope the reshuffling would also have little affect on the approximate posterior (otherwise my approximation is not so good?).

You are correct that my use of “bias” was not well defined in reference to the identity line of my plots of the estimated vs true probability of the one-divergence model. I think we can agree that, ideally (all assumptions are met), the estimated posterior probability of a model should estimate the probability that the model is correct. For large numbers of simulation

replicates, the proportion of the replicates for which the one-divergence model is true will approximate the probability that the one-divergence model is correct. Thus, if the method has the desirable (albeit “frequentist”) behavior such that the estimated posterior probability of the one-divergence model is an unbiased estimate of the probability that the one-divergence model is correct, the points should fall near the identity line. For example, let us say the method estimates a posterior probability of 0.90 for the one-divergence model for 1000 simulated datasets. If the method is accurately estimating the probability that the one-divergence model is the correct model, then the one-divergence model should be the true model for approximately 900 of the 1000 datasets. Any trend away from the identity line indicates the method is biased in the (frequentist) sense that it is not correctly estimating the probability that the one-divergence model is the correct model. I agree this measure of “bias” is frequentist in nature. However, it seems like a worthwhile goal for Bayesian model-choice methods to have good frequentist properties. If a method strongly deviates from the identity line, it is much more difficult to interpret the posterior probabilites that it estimates. Going back to my example of the posterior probability of 0.90 for 1000 replicates, I would be alarmed if the model was true in only 100 of the replicates.

My apologies if my citation of your PNAS paper seemed misleading. The citation was intended to be limited to the context of ABC methods that use summary statistics that are insufficient across the models under comparison (like msBayes and the method I present in the paper). I will definitely expand on this sentence to make this clearer in revisions. Thanks!

Lastly, my concluding remarks in the paper about full-likelihood methods in this domain are not as lofty as you might think. The likelihood function of the msBayes model is tractable, and, in fact, has already been derived and implemented via reversible-jump MCMC (albeit, not readily available yet). Also, there are plenty of examples of rich, Kingman-coalescent models implemented in full-likelihood Bayesian frameworks. Too many to list, but a lot of them are implemented in the BEAST software package. One noteworthy example is the work of Bryant et al. (2012, Molecular Biology and Evolution, 29(8), 1917–32) that analytically integrates over all gene trees for biallelic markers under the coalescent.

Filed under: Books, Statistics, University life Tagged: ABC, Bayesian statistics, consistence, Dirichlet process, exchangeability, frequency properties, Kingman's coalescent, Molecular Biology and Evolution, Monte Carlo Statistical Methods, reversible jump, sufficiency, summary statistics, taxon

### 5 Munros, enough for a day…

**T**aking advantage of cheap [early] Sunday morning flights to Edinburgh, I managed to bag a good hiking day (and three new Munros) within my trip to Scotland. I decided about the hike in the plane, picking the Lawers group as one of the closest to Edinburgh… The fair sequence of Munros in the group (5!) made it quite appealing [for a Munro-bagger], until I realised I would have to walk on a narrow road with no side-walk for 6km to complete the loop. Hence I decided on turning back after the third peak (An Stuc, recently promoted to Munro-fame!), which meant re-climbing the first two Munros from the “other” side, with a significant addition to the total differential (+1500m). The weather was traditional Scottish, with plenty of clouds, gales and gusts, a few patches of blue sky, and a pleasant drizzle for the last hour. It did not seem to bother the numerous walkers passed on the first part of the trail. As usual, an additional reward with hiking or climbing in Scotland is that one can be back in time in town (i.e., Edinburgh) for the evening curry! Even when leaving from Paris in the morning.

Filed under: Mountains, pictures, Running, University life Tagged: Ben Lawers, curry, Edinburgh, ICMS, munroes, Paris, Scotland

### 5 Munros, enough for a day…

**T**aking advantage of cheap [early] Sunday morning flights to Edinburgh, I managed to bag a good hiking day (and three new Munros) within my trip to Scotland. I decided about the hike in the plane, picking the Lawers group as one of the closest to Edinburgh… The fair sequence of Munros in the group (5!) made it quite appealing [for a Munro-bagger], until I realised I would have to walk on a narrow road with no side-walk for 6km to complete the loop. Hence I decided on turning back after the third peak (An Stuc, recently promoted to Munro-fame!), which meant re-climbing the first two Munros from the “other” side, with a significant addition to the total differential (+1500m). The weather was traditional Scottish, with plenty of clouds, gales and gusts, a few patches of blue sky, and a pleasant drizzle for the last hour. It did not seem to bother the numerous walkers passed on the first part of the trail. As usual, an additional reward with hiking or climbing in Scotland is that one can be back in time in town (i.e., Edinburgh) for the evening curry! Even when leaving from Paris in the morning.

Filed under: Mountains, pictures, Running, University life Tagged: Ben Lawers, curry, Edinburgh, ICMS, munroes, Paris, Scotland

### [h]it figures

**J**ust a few figures from wordpress about the ‘Og:

**2,845**posts;**1,009,428**views;**5,115**comments;**5,095**tags;**470,427**spam comments;**1,001**spams in the past 24 hours;- and… only
**5**amazon orders in the past month!

Filed under: Books, pictures Tagged: Amazon, blog, comments, New York Subway, spams, tags

### [h]it figures

**J**ust a few figures from wordpress about the ‘Og:

**2,845**posts;**1,009,428**views;**5,115**comments;**5,095**tags;**470,427**spam comments;**1,001**spams in the past 24 hours;- and… only
**5**amazon orders in the past month!

Filed under: Books, pictures Tagged: Amazon, blog, comments, New York Subway, spams, tags

### Le Monde puzzle [#868]

**A**nother permutation-based Le Monde mathematical puzzle:

*Given the integers 1,…n, a “perfect” combination is a pair (i,j) of integers such that no other pair enjoys the same sum. For n=33, what is the maximum of perfect combinations one can build? **And for n=214? *

**A** rather straightforward problem, or so it seemed: take the pairs (2m,2m+1), their sums all differ, and we get the maximal possible number of sums, ⌊n/2⌋… However, I did not read the question properly (!) and the constraint is on the sum (i+j), namely

*How many mutually exclusive pairs (i,j) can be found with different sums all bounded by n=33? n=2014?*

**I**n which case, the previous and obvious proposal works no longer… The dumb brute-force search

leads to a solution of

> sol [1] 12 > laperm [1] 6 9 1 24 13 20 4 7 21 14 17 3 16 11 19 25 23 18 12 26 15 2 5 10 22 [26] 8 > unique(apply(matrix(laperm,ncol=2),1,sum)) [1] 17 28 26 47 31 32 30 22 23 19 27 25 24which is close of the solution sol=13 proposed in Le Monde… It is obviously hopeless for a sum bounded by 2014. A light attempt at simulated annealing did not help either.

Filed under: Books, Kids, Statistics Tagged: Le Monde, mathematical puzzle, permutations, simulated annealing

### Le Monde puzzle [#868]

**A**nother permutation-based Le Monde mathematical puzzle:

*Given the integers 1,…n, a “perfect” combination is a pair (i,j) of integers such that no other pair enjoys the same sum. For n=33, what is the maximum of perfect combinations one can build? **And for n=214? *

**A** rather straightforward problem, or so it seemed: take the pairs (2m,2m+1), their sums all differ, and we get the maximal possible number of sums, ⌊n/2⌋… However, I did not read the question properly (!) and the constraint is on the sum (i+j), namely

*How many mutually exclusive pairs (i,j) can be found with different sums all bounded by n=33? n=2014?*

**I**n which case, the previous and obvious proposal works no longer… The dumb brute-force search

leads to a solution of

> sol [1] 12 > laperm [1] 6 9 1 24 13 20 4 7 21 14 17 3 16 11 19 25 23 18 12 26 15 2 5 10 22 [26] 8 > unique(apply(matrix(laperm,ncol=2),1,sum)) [1] 17 28 26 47 31 32 30 22 23 19 27 25 24which is close of the solution sol=13 proposed in Le Monde… It is obviously hopeless for a sum bounded by 2014. A light attempt at simulated annealing did not help either.

Filed under: Books, Kids, Statistics Tagged: Le Monde, mathematical puzzle, permutations, simulated annealing

### de.activated!

### de.activated!

### computational methods for statistical mechanics

**N**ext weak (hopefully not weak!) week, I will have the pleasure of visiting Scotland again! Indeed, I have been invited to take part to an ICMS workshop on the above topic, located “at the interface between mathematical statistics and molecular simulation”. A wonderful opportunity to meet researchers in computational physics, if challenging because of the different notations and focus as already experience im Hamburg. And to talk about some of my most current MCMC research, if I have time to modify my talk and complete a submission to NIPS… All this in the great environment of ICMS (International Centre for Mathemtical Sciences). And forecasting a pleasant time in Edinburgh, on Arthur’s Seat, and hopefully in the Scottish Highlands.

Filed under: Mountains, pictures, Running, Statistics, Travel, University life Tagged: ABC, Arthur's Seat, computational physics, Edinburgh, Hamburg, Highlands, ICMS, MCMC, molecular simulation, Monte Carlo Statistical Methods, munroes, NIPS 2014, Scotland

### computational methods for statistical mechanics

**N**ext weak (hopefully not weak!) week, I will have the pleasure of visiting Scotland again! Indeed, I have been invited to take part to an ICMS workshop on the above topic, located “at the interface between mathematical statistics and molecular simulation”. A wonderful opportunity to meet researchers in computational physics, if challenging because of the different notations and focus as already experience im Hamburg. And to talk about some of my most current MCMC research, if I have time to modify my talk and complete a submission to NIPS… All this in the great environment of ICMS (International Centre for Mathemtical Sciences). And forecasting a pleasant time in Edinburgh, on Arthur’s Seat, and hopefully in the Scottish Highlands.

Filed under: Mountains, pictures, Running, Statistics, Travel, University life Tagged: ABC, Arthur's Seat, computational physics, Edinburgh, Hamburg, Highlands, ICMS, MCMC, molecular simulation, Monte Carlo Statistical Methods, munroes, NIPS 2014, Scotland

### model selection by likelihood-free Bayesian methods

**J**ust glanced at the introduction of this arXived paper over breakfast, back from my morning run: the exact title is *“Model Selection for Likelihood-free Bayesian Methods Based on Moment Conditions: Theory and Numerical Examples”* by Cheng Li and Wenxin Jiang. (The paper is 81 pages long.) I selected the paper for its title as it connected with an interrogation of ours on the manner to extend our empirical likelihood [A]BC work to model choice. We looked at this issue with Kerrie Mengersen and Judith Rousseau the last time Kerrie visited Paris but could not spot a satisfying entry… The current paper is of a theoretical nature, considering a moment defined model

where D denotes the data, as the dimension p of the parameter θ grows with n, the sample size. The approximate model is derived from a prior on the parameter θ and of a Gaussian quasi-likelihood on the moment estimating function g(D,θ). Examples include single index longitudinal data, quantile regression and partial correlation selection. The model selection setting is one of variable selection, resulting in 2p models to compare, with p growing to infinity… Which makes the practical implementation rather delicate to conceive. And the probability one of hitting the right model a fairly asymptotic concept. (At least after a cursory read from my breakfast table!)

Filed under: Books, pictures, Running, Statistics, University life Tagged: ABC, ABC model choice, Bayesian asymptotics, ducks, empirical likelihood, likelihood-free methods, Parc de Sceaux

### model selection by likelihood-free Bayesian methods

**J**ust glanced at the introduction of this arXived paper over breakfast, back from my morning run: the exact title is *“Model Selection for Likelihood-free Bayesian Methods Based on Moment Conditions: Theory and Numerical Examples”* by Cheng Li and Wenxin Jiang. (The paper is 81 pages long.) I selected the paper for its title as it connected with an interrogation of ours on the manner to extend our empirical likelihood [A]BC work to model choice. We looked at this issue with Kerrie Mengersen and Judith Rousseau the last time Kerrie visited Paris but could not spot a satisfying entry… The current paper is of a theoretical nature, considering a moment defined model

where D denotes the data, as the dimension p of the parameter θ grows with n, the sample size. The approximate model is derived from a prior on the parameter θ and of a Gaussian quasi-likelihood on the moment estimating function g(D,θ). Examples include single index longitudinal data, quantile regression and partial correlation selection. The model selection setting is one of variable selection, resulting in 2p models to compare, with p growing to infinity… Which makes the practical implementation rather delicate to conceive. And the probability one of hitting the right model a fairly asymptotic concept. (At least after a cursory read from my breakfast table!)

Filed under: Books, pictures, Running, Statistics, University life Tagged: ABC, ABC model choice, Bayesian asymptotics, ducks, empirical likelihood, likelihood-free methods, Parc de Sceaux

### estimating normalising constants [mea culpa?!]

*“The basic idea is to estimate the parameters by learning to discriminate **between the data x and some artificially generated noise y.”*

**I**n the sequel of this popular earlier post of mine on [not] estimating normalising constants, Simon Barthelmé and Nicolas Chopin pointed me to recent papers by Michael Gutmann and Aapo Hyvaärinen on this topic, one published in the proceedings of AISTATS 2010 in Sardinia and one from the proceedings of the 2013 Workshop on Information Theoretic Methods in Science and Engineering (WITMSE2013), in Tokyo. Which led me to reconsider my perspective on this issue…

**J**ust like Larry, Gutmann and Hyvaärinen consider the normalising constant associated with an unnormalised density,

as *an extra parameter*. They then add to the actual sample from the unnormalised density an artificial sample from a fixed distribution g with identical size and eventually proceed to run a logistic regression on the model index (p *versus* g) based on those merged datasets. A logistic regression parameterised by the difference of the log-densities:

With the actual sample corresponding to the first modality and the artificial sample to the second modality. While the resulting estimator is different, this approach reminds me of the proposal we made in our nested sampling paper of 2009 with Nicolas, esp. Section 6.3 where we also introduce an artificial mixture to estimate the normalising constant (and obtain an alternative version of bridge sampling). The difference is that Gutmann and Hyvärinen estimate both Z and α by logistic regression. And without imposing the integration constraint that would turn Z into a superfluous “parameter”.

**N**ow, if we return to the original debate, does this new estimation approach close it? And if so, is it to my defeat (hence the title)?! Obviously, Gutmann and Hyvärinen use both a statistical technique and a statistical model to estimate the constant Z(α). They produce an extra artificial sample from g but exploit the current sample from p and no other. The estimator of the normalising constant is converging with the sample size. However, I do remain puzzled by the addition of the normalising constant to the parameter vector. The data comes from a probability distribution and hence the normalising constraint holds. Relaxing the constraint leads to a minimisation framework that can be interpreted as either statistics or numerics. Which keeps open my original questioning of which information about the constant Z(α) is contained in the sample per se… (But not questioning the potential in using this method in providing a constant estimate.)

Filed under: Statistics

### estimating normalising constants [mea culpa?!]

*“The basic idea is to estimate the parameters by learning to discriminate **between the data x and some artificially generated noise y.”*

**I**n the sequel of this popular earlier post of mine on [not] estimating normalising constants, Simon Barthelmé and Nicolas Chopin pointed me to recent papers by Michael Gutmann and Aapo Hyvaärinen on this topic, one published in the proceedings of AISTATS 2010 in Sardinia and one from the proceedings of the 2013 Workshop on Information Theoretic Methods in Science and Engineering (WITMSE2013), in Tokyo. Which led me to reconsider my perspective on this issue…

**J**ust like Larry, Gutmann and Hyvaärinen consider the normalising constant associated with an unnormalised density,

as *an extra parameter*. They then add to the actual sample from the unnormalised density an artificial sample from a fixed distribution g with identical size and eventually proceed to run a logistic regression on the model index (p *versus* g) based on those merged datasets. A logistic regression parameterised by the difference of the log-densities:

With the actual sample corresponding to the first modality and the artificial sample to the second modality. While the resulting estimator is different, this approach reminds me of the proposal we made in our nested sampling paper of 2009 with Nicolas, esp. Section 6.3 where we also introduce an artificial mixture to estimate the normalising constant (and obtain an alternative version of bridge sampling). The difference is that Gutmann and Hyvärinen estimate both Z and α by logistic regression. And without imposing the integration constraint that would turn Z into a superfluous “parameter”.

**N**ow, if we return to the original debate, does this new estimation approach close it? And if so, is it to my defeat (hence the title)?! Obviously, Gutmann and Hyvärinen use both a statistical technique and a statistical model to estimate the constant Z(α). They produce an extra artificial sample from g but exploit the current sample from p and no other. The estimator of the normalising constant is converging with the sample size. However, I do remain puzzled by the addition of the normalising constant to the parameter vector. The data comes from a probability distribution and hence the normalising constraint holds. Relaxing the constraint leads to a minimisation framework that can be interpreted as either statistics or numerics. Which keeps open my original questioning of which information about the constant Z(α) is contained in the sample per se… (But not questioning the potential in using this method in providing a constant estimate.)

Filed under: Statistics

### Bayesian Analysis, Volume 9, Number 2 (2014)

Contents:

**Zhihua Zhang**, **Dakan Wang**, **Guang Dai**, **Michael I. Jordan**. Matrix-Variate Dirichlet Process Priors with Applications. 259--286.

**Nammam Ali Azadi**, **Paul Fearnhead**, **Gareth Ridall**, **Joleen H. Blok**. Bayesian Sequential Experimental Design for Binary Response Data with Application to Electromyographic Experiments. 287--306.

**Juhee Lee**, **Steven N. MacEachern**, **Yiling Lu**, **Gordon B. Mills**. Local-Mass Preserving Prior Distributions for Nonparametric Bayesian Models. 307--330.

**Ruitao Liu**, **Arijit Chakrabarti**, **Tapas Samanta**, **Jayanta K. Ghosh**, **Malay Ghosh**. On Divergence Measures Leading to Jeffreys and Other Reference Priors. 331--370.

**Xin-Yuan Song**, **Jing-Heng Cai**, **Xiang-Nan Feng**, **Xue-Jun Jiang**. Bayesian Analysis of the Functional-Coefficient Autoregressive Heteroscedastic Model. 371--396.

**Yu Ryan Yue**, **Daniel Simpson**, **Finn Lindgren**, **Håvard Rue**. Bayesian Adaptive Smoothing Splines Using Stochastic Differential Equations. 397--424.

**Jaakko Riihimäki**, **Aki Vehtari**. Laplace Approximation for Logistic Gaussian Process Density Estimation and Regression. 425--448.

**Fei Liu**, **Sounak Chakraborty**, **Fan Li**, **Yan Liu**, **Aurelie C. Lozano**. Bayesian Regularization via Graph Laplacian. 449--474.

**Catia Scricciolo**. Adaptive Bayesian Density Estimation in $L^{p}$ -metrics with Pitman-Yor or Normalized Inverse-Gaussian Process Kernel Mixtures. 475--520.

### Matrix-Variate Dirichlet Process Priors with Applications

**Zhihua Zhang**,

**Dakan Wang**,

**Guang Dai**,

**Michael I. Jordan**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 259--286.

**Abstract:**

In this paper we propose a matrix-variate Dirichlet process (MATDP) for modeling the joint prior of a set of random matrices. Our approach is able to share statistical strength among regression coefficient matrices due to the clustering property of the Dirichlet process. Moreover, since the base probability measure is defined as a matrix-variate distribution, the dependence among the elements of each random matrix is described via the matrix-variate distribution. We apply MATDP to multivariate supervised learning problems. In particular, we devise a nonparametric discriminative model and a nonparametric latent factor model. The interest is in considering correlations both across response variables (or covariates) and across response vectors. We derive Markov chain Monte Carlo algorithms for posterior inference and prediction, and illustrate the application of the models to multivariate regression, multi-class classification and multi-label prediction problems.

### Bayesian Sequential Experimental Design for Binary Response Data with Application to Electromyographic Experiments

**Nammam Ali Azadi**,

**Paul Fearnhead**,

**Gareth Ridall**,

**Joleen H. Blok**.

**Source: **Bayesian Analysis, Volume 9, Number 2, 287--306.

**Abstract:**

We develop a sequential Monte Carlo approach for Bayesian analysis of the experimental design for binary response data. Our work is motivated by surface electromyographic (SEMG) experiments, which can be used to provide information about the functionality of subjects’ motor units. These experiments involve a series of stimuli being applied to a motor unit, with whether or not the motor unit fires for each stimulus being recorded. The aim is to learn about how the probability of firing depends on the applied stimulus (the so-called stimulus-response curve). One such excitability parameter is an estimate of the stimulus level for which the motor unit has a 50% chance of firing. Within such an experiment we are able to choose the next stimulus level based on the past observations. We show how sequential Monte Carlo can be used to analyse such data in an online manner. We then use the current estimate of the posterior distribution in order to choose the next stimulus level. The aim is to select a stimulus level that mimimises the expected loss of estimating a quantity, or quantities, of interest. We will apply this loss function to the estimates of target quantiles from the stimulus-response curve. Through simulation we show that this approach is more efficient than existing sequential design methods in terms of estimating the quantile(s) of interest. If applied in practice, it could reduce the length of SEMG experiments by a factor of three.