Bayesian Bloggers

another instance of ABC?

Xian's Og - Mon, 2014-12-01 18:14

“These characteristics are (1) likelihood is not available; (2) prior information is available; (3) a portion of the prior information is expressed in terms of functionals of the model that cannot be converted into an analytic prior on model parameters; (4) the model can be simulated. Our approach depends on an assumption that (5) an adequate statistical model for the data are available.”

A 2009 JASA paper by Ron Gallant and Rob McCulloch, entitled “On the Determination of General Scientific Models With Application to Asset Pricing”, may have or may not have connection with ABC, to wit the above quote, but I have trouble checking whether or not this is the case.

The true (scientific) model parametrised by θ is replaced with a (statistical) substitute that is available in closed form. And parametrised by g(θ). [If you can get access to the paper, I’d welcome opinions about Assumption 1 therein which states that the intractable density is equal to a closed-form density.] And the latter is over-parametrised when compared with the scientific model. As in, e.g., a N(θ,θ²) scientific model versus a N(μ,σ²) statistical model. In addition, the prior information is only available on θ. However, this does not seem to matter that much since (a) the Bayesian analysis is operated on θ only and (b) the Metropolis approach adopted by the authors involves simulating a massive number of pseudo-observations, given the current value of the parameter θ and the scientific model, so that the transform g(θ) can be estimated by maximum likelihood over the statistical model. The paper suggests using a secondary Markov chain algorithm to find this MLE. Which is claimed to be a simulated annealing resolution (p.121) although I do not see the temperature decreasing. The pseudo-model is then used in a primary MCMC step.

Hence, not truly an ABC algorithm. In the same setting, ABC would use a simulated dataset the same size as the observed dataset, compute the MLEs for both and compare them. Faster if less accurate when Assumption 1 [that the statistical model holds for a restricted parametrisation] does not stand.

Another interesting aspect of the paper is about creating and using a prior distribution around the manifold η=g(θ). This clearly relates to my earlier query about simulating on measure zero sets. The paper does not bring a definitive answer, as it never simulates exactly on the manifold, but this constitutes another entry on this challenging problem…

Filed under: Statistics Tagged: ABC, ABC-MCMC, Kullback-Leibler divergence, measure theory, Metropolis-Hastings algorithms, pseudo-likelihood
Categories: Bayesian Bloggers

reflections on the probability space induced by moment conditions with implications for Bayesian Inference [discussion]

Xian's Og - Sun, 2014-11-30 18:14

[Following my earlier reflections on Ron Gallant’s paper, here is a more condensed set of questions towards my discussion of next Friday.]

“If one specifies a set of moment functions collected together into a vector m(x,θ) of dimension M, regards θ as random and asserts that some transformation Z(x,θ) has distribution ψ then what is required to use this information and then possibly a prior to make valid inference?” (p.4)

The central question in the paper is whether or not given a set of moment equations

(where both the Xi‘s and θ are random), one can derive a likelihood function and a prior distribution compatible with those. It sounds to me like a highly complex question since it implies the integral equation

must have a solution for all n’s. A related question that was also remanent with fiducial distributions is how on Earth (or Middle Earth) the concept of a random theta could arise outside Bayesian analysis. And another one is how could the equations make sense outside the existence of the pair (prior,likelihood). A question that may exhibit my ignorance of structural models. But which may also relate to the inconsistency of Zellner’s (1996) Bayesian method of moments as exposed by Geisser and Seidenfeld (1999).

For instance, the paper starts (why?) with the Fisherian example of the t distribution of

which is truly is a t variable when θ is fixed at the true mean value. Now, if we assume that the joint distribution of the Xi‘s and θ is such that this projection is a t variable, is there any other case than the Dirac mass on θ? For all (large enough) sample sizes n? I cannot tell and the paper does not bring [me] an answer either.

When I look at the analysis made in the abstraction part of the paper, I am puzzled by the starting point (17), where

since the lhs and rhs operate on different spaces. In Fisher’s example, x is an n-dimensional vector, while Z is unidimensional. If I apply blindly the formula on this example, the t density does not integrate against the Lebesgue measure in the n-dimension Euclidean space… If a change of measure allows for this representation, I do not see so much appeal in using this new measure and anyway wonder in which sense this defines a likelihood function, i.e. the product of n densities of the Xi‘s conditional on θ. To me this is the central issue, which remains unsolved by the paper.

Filed under: Books, Statistics, University life Tagged: Arnold Zellner, empirical likelihood, fiducial distribution, measure theory, method of moments, R.A. Fisher, structural model
Categories: Bayesian Bloggers

Moon over Soho [book review]

Xian's Og - Fri, 2014-11-28 18:14

A book from the pile I brought back from Gainesville. And the first I read, mostly during the trip back to Paris. Both because I was eager to see the sequel to Rivers of London and because it was short and easy to carry in a pocket.

“From the figures I have, I believe that two to three jazz musicians have died within twenty-four hours of playing a gig in the Greater London area in the last year.”
“I take it that’s statistically significant?

Moon over Soho is the second installment in the Peter Grant series by Ben Aaronovitch. It would not read well on its own as it takes over when Rivers of London stopped. Even though it reintroduces most of the rules of this magical universe. Most characters are back (except for the hostaged Beverly) and they are trying to cope with what happened in the first installment. The story is even more centred on jazz than in the first volume, with as a corollary, Peter Grant’s parents taking a more important part in the book. The recovering Leslie is hardly seen (for obvious reasons) and heard, which leaves a convenient hole in Grant’s sentimental life! The book also introduces a major magical villein who will undoubtedly figures in the incoming books. Another great story, even though the central plot has a highly predictable ending, and even more end of the ending, and some parts sound like repetitions of similar parts in the first volume. But the tone, the pace, the style, the humour, the luv’ of Lundun, all are there and so it is all that matters! (I again bemoan the missing map of London!)

Filed under: Books, Kids, Travel Tagged: Ben Aaronnovitch, book review, England, jazz, London, Moon over Soho, Peter Grant series, WW II
Categories: Bayesian Bloggers

Le Monde puzzle [#887quater]

Xian's Og - Thu, 2014-11-27 18:14

And yet another resolution of this combinatorics Le Monde mathematical puzzle: that puzzle puzzled many more people than usual! This solution is by Marco F, using a travelling salesman representation and existing TSP software.

N is a golden number if the sequence {1,2,…,N} can be reordered so that the sum of any consecutive pair is a perfect square. What are the golden numbers between 1 and 25?

For instance, take n=199, you should first calculate the “friends”. Save them on a symmetric square matrix:

m1 <- matrix(Inf, nrow=199, ncol=199) diag(m1) <- 0 for (i in 1:199) m1[i,friends[i]] <- 1

Export the distance matrix to a file (in TSPlib format):

library(TSP) tsp <- TSP(m1) tsp image(tsp) write_TSPLIB(tsp, "f199.TSPLIB")

And use a solver to obtain the results. The best solver for TSP is Concorde. There are online versions where you can submit jobs:

0 2 1000000 2 96 1000000 96 191 1000000 191 168 1000000 ...

The numbers of the solution are in the second column (2, 96, 191, 168…). And they are 0-indexed, so you have to add 1 to them:

3 97 192 169 155 101 188 136 120 49 176 148 108 181 143 113 112 84 37 63 18 31 33 88168 193 96 160 129 127 162 199 90 79 177 147 78 22 122 167 194 130 39 157 99 190 13491 198 58 23 41 128 196 60 21 100 189 172 152 73 183 106 38 131 125 164 197 59 110 146178 111 145 80 20 61 135 121 75 6 94 195166 123 133 156 69 52 144 81 40 9 72 184 12 24 57 87 82 62 19 45 76 180 109 116 173 151 74 26 95 161 163 126 43 153 17154 27 117 139 30 70 11 89 107 118 138 186103 66 159 165 124 132 93 28 8 17 32 45 44 77 179 182 142 83 86 14 50 175 114 55 141 115 29 92 104 185 71 10 15 34 27 42 154 170 191 98 158 67 102 187 137 119 25 56 65 35 46 150 174 51 13 68 53 47 149 140 85 36 64 105 16 48
Filed under: Books, Kids, R, Statistics, University life Tagged: Le Monde, mathematical puzzle, travelling salesman Concorde
Categories: Bayesian Bloggers

Le Monde puzzle [#887ter]

Xian's Og - Wed, 2014-11-26 18:14

Here is a graph solution to the recent combinatorics Le Monde mathematical puzzle, proposed by John Shonder:

N is a golden number if the sequence {1,2,…,N} can be reordered so that the sum of any consecutive pair is a perfect square. What are the golden numbers between 1 and 25?

Consider an undirected graph GN with N vertices labelled 1 through N. Draw an edge between vertices i and j if and only if i + j is a perfect square. Then N is golden if GN contains a Hamiltonian path — that is, if there is a connected path that visits all of the vertices exactly once.I wrote a program (using Mathematica, though I’m sure there must be an R library with similar functionality) that builds up G sequentially and checks at each step whether the graph contains a Hamiltonian path. The program starts with G1 — a single vertex and no edges. Then it adds vertex 2. G2 has no edges, so 2 isn’t golden.

Adding vertex 3, there is an edge between 1 and 3. But vertex 2 is unconnected, so we’re still not golden.

The results are identical to yours, but I imagine my program runs a bit faster. Mathematica contains a built-in function to test for the existence of a Hamiltonian path.

Some of the graphs are interesting. I include representations of G25 and G36. Note that G36 contains a Hamiltonian cycle, so you could arrange the integers 1 … 36 on a roulette wheel such that each consecutive pair adds to a perfect square.

A somewhat similar problem:

Call N a “leaden” number if the sequence {1,2, …, N} can be reordered so that the sum of any consecutive pair is a prime number. What are the leaden numbers between 1 and 100? What about an arrangement such that the absolute value of the difference between any two consecutive numbers is prime?

[The determination of the leaden numbers was discussed in a previous Le Monde puzzle post.]

Filed under: Books, Kids, Statistics, University life Tagged: graph theory, Hamiltonian path, Le Monde, Mathematica, mathematical puzzle
Categories: Bayesian Bloggers

Methodological developments in evolutionary genomic [3 years postdoc in Montpellier]

Xian's Og - Wed, 2014-11-26 08:18

[Here is a call for a post-doctoral position in Montpellier, South of France, not Montpelier, Vermont!, in a population genetics group with whom I am working. Highly recommended if you are currently looking for a postdoc!]

Three-year post-doctoral position at the Institute of Computational Biology (IBC), Montpellier (France) :
Methodological developments in evolutionary genomics.

One young investigator position opens immediately at the Institute for Computational Biology (IBC) of Montpellier (France) to work on the development of innovative inference methods and software in population genomics or phylogenetics to analyze large-scale genomic data in the fields of health, agronomy and environment (Work Package 2 « evolutionary genomics » of the IBC). The candidate will develop its own research on some of the following topics : selective processes, demographic history, spatial genetic processes, very large phylogenies reconstruction, gene/species tree reconciliation, using maximum likelihood, Bayesian and simulation-based inference. We are seeking a candidate with a strong background in mathematical and computational evolutionary biology, with interest in applications and software development. The successfull candidate will work on his own project, build in collaboration with any researcher involved in the WP2 project and working at the IBC labs (AGAP, CBGP, ISEM, I3M, LIRMM, MIVEGEC).

IBC hires young investigators, typically with a PhD plus some post-doc experience, a high level of publishing, strong communication abilities, and a taste for multidisciplinary research. Working full-time at IBC, these young researchers will play a key role in Institute life. Most of their time will be devoted to scientific projects. In addition, they are expected to actively participate in the coordination of workpackages, in the hosting of foreign researchers and in the organization of seminars and events (summer schools, conferences…). In exchange, these young researchers will benefit from an exceptional environment thanks to the presence of numerous leading international researchers, not to mention significant autonomy for their work. Montpellier hosts one of the most vibrant communities of biodiversity research in Europe with several research centers of excellence in the field. This positions is open for up to 3 years with a salary well above the French post-doc standards. Starting date is open to discussion.

 The application deadline is January 31, 2015.

Living in Montpellier:


Contacts at WP2 « Evolutionary Genetics » :


Jean-Michel Marin :

François Rousset :

Vincent Ranwez :

Olivier Gascuel :

Submit my application :

Filed under: pictures, Statistics, Travel, University life, Wines Tagged: academic position, Bayesian statistics, biodiversity, computational biology, France, Institut de Biologie Computationelle, Montpellier, phylogenetic models, position, postdoctoral position
Categories: Bayesian Bloggers

reflections on the probability space induced by moment conditions with implications for Bayesian Inference [refleXions]

Xian's Og - Tue, 2014-11-25 18:14

“The main finding is that if the moment functions have one of the properties of a pivotal, then the assertion of a distribution on moment functions coupled with a proper prior does permit Bayesian inference. Without the semi-pivotal condition, the assertion of a distribution for moment functions either partially or completely specifies the prior.” (p.1)

Ron Gallant will present this paper at the Conference in honour of Christian Gouréroux held next week at Dauphine and I have been asked to discuss it. What follows is a collection of notes I made while reading the paper , rather than a coherent discussion, to come later. Hopefully prior to the conference.

The difficulty I have with the approach presented therein stands as much with the presentation as with the contents. I find it difficult to grasp the assumptions behind the model(s) and the motivations for only considering a moment and its distribution. Does it all come down to linking fiducial distributions with Bayesian approaches? In which case I am as usual sceptical about the ability to impose an arbitrary distribution on an arbitrary transform of the pair (x,θ), where x denotes the data. Rather than a genuine prior x likelihood construct. But I bet this is mostly linked with my lack of understanding of the notion of structural models.

“We are concerned with situations where the structural model does not imply exogeneity of θ, or one prefers not to rely on an assumption of exogeneity, or one cannot construct a likelihood at all due to the complexity of the model, or one does not trust the numerical approximations needed to construct a likelihood.” (p.4)

As often with econometrics papers, this notion of structural model sets me astray: does this mean any latent variable model or an incompletely defined model, and if so why is it incompletely defined? From a frequentist perspective anything random is not a parameter. The term exogeneity also hints at this notion of the parameter being not truly a parameter, but including latent variables and maybe random effects. Reading further (p.7) drives me to understand the structural model as defined by a moment condition, in the sense that

has a unique solution in θ under the true model. However the focus then seems to make a major switch as Gallant considers the distribution of a pivotal quantity like

as induced by the joint distribution on (x,θ), hence conversely inducing constraints on this joint, as well as an associated conditional. Which is something I have trouble understanding, First, where does this assumed distribution on Z stem from? And, second, exchanging randomness of terms in a random variable as if it was a linear equation is a pretty sure way to produce paradoxes and measure theoretic difficulties.

The purely mathematical problem itself is puzzling: if one knows the distribution of the transform Z=Z(X,Λ), what does that imply on the joint distribution of (X,Λ)? It seems unlikely this will induce a single prior and/or a single likelihood… It is actually more probable that the distribution one arbitrarily selects on m(x,θ) is incompatible with a joint on (x,θ), isn’t it?

“The usual computational method is MCMC (Markov chain Monte Carlo) for which the best known reference in econometrics is Chernozhukov and Hong (2003).” (p.6)

While I never heard of this reference before, it looks like a 50 page survey and may be sufficient for an introduction to MCMC methods for econometricians. What I do not get though is the connection between this reference to MCMC and the overall discussion of constructing priors (or not) out of fiducial distributions. The author also suggests using MCMC to produce the MAP estimate but this always stroke me as inefficient (unless one uses our SAME algorithm of course).

“One can also compute the marginal likelihood from the chain (Newton and Raftery (1994)), which is used for Bayesian model comparison.” (p.22)

Not the best solution to rely on harmonic means for marginal likelihoods…. Definitely not. While the author actually uses the stabilised version (15) of Newton and Raftery (1994) estimator, which in retrospect looks much like a bridge sampling estimator of sorts, it remains dangerously close to the original [harmonic mean solution] especially for a vague prior. And it only works when the likelihood is available in closed form.

“The MCMC chains were comprised of 100,000 draws well past the point where transients died off.” (p.22)

I wonder if the second statement (with a very nice image of those dying transients!) is intended as a consequence of the first one or independently.

“A common situation that requires consideration of the notions that follow is that deriving the likelihood from a structural model is analytically intractable and one cannot verify that the numerical approximations one would have to make to circumvent the intractability are sufficiently accurate.” (p.7)

This then is a completely different business, namely that defining a joint distribution by mean of moment equations prevents regular Bayesian inference because the likelihood is not available. This is more exciting because (i) there are alternative available! From ABC to INLA (maybe) to EP to variational Bayes (maybe). And beyond. In particular, the moment equations are strongly and even insistently suggesting that empirical likelihood techniques could be well-suited to this setting. And (ii) it is no longer a mathematical worry: there exist a joint distribution on m(x,θ), induced by a (or many) joint distribution on (x,θ). So the question of finding whether or not it induces a single proper prior on θ becomes relevant. But, if I want to use ABC, being given the distribution of m(x,θ) seems to mean I can only generate new values of this transform while missing a natural distance between observations and pseudo-observations. Still, I entertain lingering doubts that this is the meaning of the study. Where does the joint distribution come from..?!

“Typically C is coarse in the sense that it does not contain all the Borel sets (…)  The probability space cannot be used for Bayesian inference”

My understanding of that part is that defining a joint on m(x,θ) is not always enough to deduce a (unique) posterior on θ, which is fine and correct, but rather anticlimactic. This sounds to be what Gallant calls a “partial specification of the prior” (p.9).

Overall, after this linear read, I remain very much puzzled by the statistical (or Bayesian) implications of the paper . The fact that the moment conditions are central to the approach would once again induce me to check the properties of an alternative approach like empirical likelihood.

Filed under: Statistics, University life Tagged: ABC, compatible conditional distributions, empirical likelihood, expectation-propagation, harmonic mean estimator, INLA, latent variable, MCMC, prior distributions, structural model, variational Bayes methods
Categories: Bayesian Bloggers

prayers and chi-square

Xian's Og - Mon, 2014-11-24 18:14

One study I spotted in Richard Dawkins’ The God delusion this summer by the lake is a study of the (im)possible impact of prayer over patient’s recovery. As a coincidence, my daughter got this problem in her statistics class of last week (my translation):

1802 patients in 6 US hospitals have been divided into three groups. Members in group A was told that unspecified religious communities would pray for them nominally, while patients in groups B and C did not know if anyone prayed for them. Those in group B had communities praying for them while those in group C did not. After 14 days of prayer, the conditions of the patients were as follows:

  • out of 604 patients in group A, the condition of 249 had significantly worsened;
  • out of 601 patients in group B, the condition of 289 had significantly worsened;
  • out of 597 patients in group C, the condition of 293 had significantly worsened.

 Use a chi-square procedure to test the homogeneity between the three groups, a significant impact of prayers, and a placebo effect of prayer.

This may sound a wee bit weird for a school test, but she is in medical school after all so it is a good way to enforce rational thinking while learning about the chi-square test! (Answers: [even though the data is too sparse to clearly support a decision, esp. when using the chi-square test!] homogeneity and placebo effect are acceptable assumptions at level 5%, while the prayer effect is not [if barely].)

Filed under: Books, Kids, Statistics, University life Tagged: binomial distribution, chi-square test, exercises, medical school, prayer, Richard Dawkins, The God Delusion
Categories: Bayesian Bloggers

an ABC experiment

Xian's Og - Sun, 2014-11-23 18:14


In a cross-validated forum exchange, I used the code below to illustrate the working of an ABC algorithm:

#normal data with 100 observations n=100 x=rnorm(n) #observed summaries sumx=c(median(x),mad(x)) #normal x gamma prior priori=function(N){ return(cbind(rnorm(N,sd=10), 1/sqrt(rgamma(N,shape=2,scale=5)))) } ABC=function(N,alpha=.05){ prior=priori(N) #reference table #pseudo-data summ=matrix(0,N,2) for (i in 1:N){ xi=rnorm(n)*prior[i,2]+prior[i,1] summ[i,]=c(median(xi),mad(xi)) #summaries } #normalisation factor for the distance mads=c(mad(summ[,1]),mad(summ[,2])) #distance dist=(abs(sumx[1]-summ[,1])/mads[1])+ (abs(sumx[2]-summ[,2])/mads[2]) #selection posterior=prior[dist<quantile(dist,alpha),]}

Hence I used the median and the mad as my summary statistics. And the outcome is rather surprising, for two reasons: the first one is that the posterior on the mean μ is much wider than when using the mean and the variance as summary statistics. This is not completely surprising in that the latter are sufficient, while the former are not. Still, the (-10,10) range on the mean is way larger… The second reason for surprise is that the true posterior distribution cannot be derived since the joint density of med and mad is unavailable.

After thinking about this for a while, I went back to my workbench to check the difference with using mean and variance. To my greater surprise, I found hardly any difference! Using the almost exact ABC with 10⁶ simulations and a 5% subsampling rate returns exactly the same outcome. (The first row above is for the sufficient statistics (mean,standard deviation) while the second row is for the (median,mad) pair.) Playing with the distance does not help. The genuine posterior output is quite different, as exposed on the last row of the above, using a basic Gibbs sampler since the posterior is not truly conjugate.

Filed under: Books, pictures, R, Statistics, University life Tagged: ABC, Gibbs sampling, MCMC, mean, median, median absolute deviation, Monte Carlo Statistical Methods, normal model, summary statistics
Categories: Bayesian Bloggers

Challis Lectures

Xian's Og - Sat, 2014-11-22 18:14


I had a great time during this short visit in the Department of Statistics, University of Florida, Gainesville. First, it was a major honour to be the 2014 recipient of the George H. Challis Award and I considerably enjoyed delivering my lectures on mixtures and on ABC with random forests, And chatting with members of the audience about the contents afterwards. Here is the physical award I brought back to my office:

More as a piece of trivia, here is the amount of information about the George H. Challis Award I found on the UF website:

This fund was established in 2000 by Jack M. and Linda Challis Gill and the Gill Foundation of Texas, in memory of Linda’s father, to support faculty and student conference travel awards and the George Challis Biostatistics Lecture Series. George H. Challis was born on December 8, 1911 and was raised in Italy and Indiana. He was the first cousin of Indiana composer Cole Porter. George earned a degree in 1933 from the School of Business at Indiana University in Bloomington. George passed away on May 6, 2000. His wife, Madeline, passed away on December 14, 2009.

Cole Porter, indeed!

On top of this lecturing activity, I had a full academic agenda, discussing with most faculty members and PhD students of the Department, on our respective research themes over the two days I was there and it felt like there was not enough time! And then, during the few remaining hours where I did not try to stay on French time (!), I had a great time with my friends Jim and Maria in Gainesville, tasting a fantastic local IPA beer from Cigar City Brewery and several great (non-local) red wines… Adding to that a pile of new books, a smooth trip both ways, and a chance encounter with Alicia in Atlanta airport, it was a brilliant extended weekend!

Filed under: Books, pictures, Statistics, Travel, University life, Wines Tagged: ABC, Cigar City Brewery, Cole Porter, finite mixtures, Florida, Gainesville, George H. Challis Award, random forests
Categories: Bayesian Bloggers

a pile of new books

Xian's Og - Fri, 2014-11-21 18:14

I took the opportunity of my weekend trip to Gainesville to order a pile of books on amazon, thanks to my amazon associate account (and hence thanks to all Og’s readers doubling as amazon customers!). The picture above is missing two  Rivers of London volumes by Ben Aaraonovitch that I already read and left at the office. And reviewed in incoming posts. Among those,

(Obviously, all “locals” sharing my taste in books are welcome to borrow those in a very near future!)

Filed under: Books, Travel, University life Tagged: amazon associates, book reviews, Booker Prize, Florida, Gainesville, Hugo Awards, John Scalzi, Robin Hobb, The Name of the Wind, Walter Miller
Categories: Bayesian Bloggers

some LaTeX tricks

Xian's Og - Fri, 2014-11-21 08:18

Here are a few LaTeX tricks I learned or rediscovered when working on several papers the past week:

  1. I am always forgetting how to make aligned equations with a single equation number, so I found this solution on the TeX forum of stackexchange, Namely use the equation environment and then an aligned environment inside. Or the split environment. But it does not always work…
  2. Another frustrating black hole is how to deal with integral signs that do not adapt to the integrand. Too bad we cannot use \left\int, really! Another stackexchange question led me to the bigints package. Not perfect though.
  3. Pierre Pudlo also showed me the commands \graphicspath{{dir1}{dir2}} and \DeclareGraphicsExtensions{.pdf,.png,.jpg} to avoid coding the entire path to each image and to put an order on the extension type, respectively. The second one is fairly handy when working on drafts. The first one does not seem to work with symbolic links, though…

Filed under: Books, Kids, Statistics, University life Tagged: bigint, graphical extension, LaTeX, mathematical equations, StackExchange
Categories: Bayesian Bloggers

not converging to London for an [extra]ordinary Read Paper

Xian's Og - Thu, 2014-11-20 18:14

On December 10, I will alas not travel to London to attend the Read Paper on sequential quasi-Monte Carlo presented by Mathieu Gerber and Nicolas Chopin to The Society, as I fly instead to Montréal for the NIPS workshops… I am quite sorry to miss this event, as this is a major paper which brings quasi-Monte Carlo methods into mainstream statistics. I will most certainly write a discussion and remind Og’s readers that contributed (800 words) discussions are welcome from everyone, the deadline for submission being January 02.

Filed under: Books, Kids, pictures, Statistics, Travel, University life Tagged: discussion paper, London, MCQMC, Nicolas Chopin, NIPS 2014, Read paper, Royal Statistical Society, sequential Monte Carlo
Categories: Bayesian Bloggers

Bayesian evidence and model selection

Xian's Og - Wed, 2014-11-19 18:14

Another arXived paper with a topic close to my interests, posted by Knuth et al. today. Namely, Bayesian model selection. However, after reading the paper in Gainesville, I am rather uncertain about its prospects, besides providing an entry to the issue (for physicists?). Indeed, the description of (Bayesian) evidence is concentrating on rough approximations, in a physics perspective, with a notion of Occam’s factor that measures the divergence to the maximum likelihood. (As usual when reading physics literature, I am uncertain as to why one should consider always approximations.) The numerical part mentions the tools of importance sampling and Laplace approximations, path sampling and nested sampling. The main part of the paper consists in applying those tools to signal processing models. One of them is a mixture example where nested sampling is used to evaluate the most likely number of components. Using uniform priors over non-specified hypercubes. In an example about photometric signal from an exoplanet, two models are distinguished by evidences of 37,764 and 37,765, with another one at 37,748. It seems to me that this very proximity simply prevents the comparison of those models, even without accounting for the Monte Carlo variability. And does not suffice to conclude about a scientific theory (“effectively characterize exoplanetary systems”). Which leads to my current thinking, already expressed on that blog, that Bayes factors and posterior probabilities should be replaced with an alternative, including uncertainty about the very Bayes factor (or evidence).

Filed under: Statistics
Categories: Bayesian Bloggers

differences between Bayes factors and normalised maximum likelihood

Xian's Og - Tue, 2014-11-18 18:14

A recent arXival by Heck, Wagenmaker and Morey attracted my attention: Three Qualitative Differences Between Bayes Factors and Normalized Maximum Likelihood, as it provides an analysis of the differences between Bayesian analysis and Rissanen’s Optimal Estimation of Parameters that I reviewed a while ago. As detailed in this review, I had difficulties with considering the normalised likelihood

as the relevant quantity. One reason being that the distribution does not make experimental sense: for instance, how can one simulate from this distribution? [I mean, when considering only the original distribution.] Working with the simple binomial B(n,θ) model, the authors show the quantity corresponding to the posterior probability may be constant for most of the data values, produces a different upper bound and hence a different penalty of model complexity, and may differ in conclusion for some observations. Which means that the apparent proximity to using a Jeffreys prior and Rissanen’s alternative does not go all the way. While it is a short note and only focussed on producing an illustration in the Binomial case, I find it interesting that researchers investigate the Bayesian nature (vs. artifice!) of this approach…

Filed under: Books, Kids, Statistics, University life Tagged: Bayes factors, binomial distribution, model complexity, normalised maximum likelihood, Rissanen
Categories: Bayesian Bloggers

importance sampling schemes for evidence approximation [revised]

Xian's Og - Mon, 2014-11-17 18:14

After a rather intense period of new simulations and versions, Juong Een (Kate) Lee and I have now resubmitted our paper on (some) importance sampling schemes for evidence approximation in mixture models to Bayesian Analysis. There is no fundamental change in the new version but rather a more detailed description of what those importance schemes mean in practice. The original idea in the paper is to improve upon the Rao-Blackwellisation solution proposed by Berkoff et al. (2002) and later by Marin et al. (2005) to avoid the impact of label switching on Chib’s formula. The Rao-Blackwellisation consists in averaging over all permutations of the labels while the improvement relies on the elimination of useless permutations, namely those that produce a negligible conditional density in Chib’s (candidate’s) formula. While the improvement implies truncated the overall sum and hence induces a potential bias (which was the concern of one referee), the determination of the irrelevant permutations after relabelling next to a single mode does not appear to cause any bias, while reducing the computational overload. Referees also made us aware of many recent proposals that conduct to different evidence approximations, albeit not directly related with our purpose. (One was Rodrigues and Walker, 2014, discussed and commented in a recent post.)

Filed under: Statistics, University life Tagged: Andrew Gelman, candidate approximation, Chib's approximation, evidence, finite mixtures, label switching, permutations, Rao-Blackwellisation
Categories: Bayesian Bloggers

a probabilistic proof to a quasi-Monte Carlo lemma

Xian's Og - Sun, 2014-11-16 18:14

As I was reading in the Paris métro a new textbook on Quasi-Monte Carlo methods, Introduction to Quasi-Monte Carlo Integration and Applications, written by Gunther Leobacher and Friedrich Pillichshammer, I came upon the lemma that, given two sequences on (0,1) such that, for all i’s,

and the geometric bound made me wonder if there was an easy probabilistic proof to this inequality. Rather than the algebraic proof contained in the book. Unsurprisingly, there is one based on associating with each pair (u,v) a pair of independent events (A,B) such that, for all i’s,

and representing

Obviously, there is no visible consequence to this remark, but it was a good way to switch off the métro hassle for a while! (The book is under review and the review will hopefully be posted on the ‘Og as soon as it is completed.)

Filed under: Books, Statistics, Travel, University life Tagged: book review, inequalities, métro, Paris, probability basics, quasi-Monte Carlo methods
Categories: Bayesian Bloggers