Bayesian News Feeds

off to Montréal [NIPS workshops]

Xian's Og - Tue, 2014-12-09 08:18

On Thursday, I will travel to Montréal for the two days of NIPS workshop there. On Friday, there is the ABC in Montréal workshop that I cannot but attend! (First occurrence of an “ABC in…” in North America! Sponsored by ISBA as well.) And on Saturday, there is the 3rd NIPS Workshop on Probabilistic Programming where I am invited to give a talk on… ABC! And maybe will manage to get a sneak at the nearby workshop on Advances in variational inference… (0n a very personal side, I wonder if the weather will remain warm enough to go running in the early morning.)

Filed under: Statistics, Travel, University life Tagged: ABC in Montréal, ISBA, ISBA@NIPS, Montréal, NIPS, probabilistic progamming, variational Bayes methods
Categories: Bayesian Bloggers

amazonish thanks (& repeated warning)

Xian's Og - Mon, 2014-12-08 18:14

As in previous years, at about this time, I want to (re)warn unaware ‘Og readers that all links to and more rarely to found on this blog are actually susceptible to earn me an advertising percentage if a purchase is made by the reader in the 24 hours following the entry on Amazon through this link, thanks to the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Unlike last year, I did not benefit as much from the new edition of Andrew’s book, and the link he copied from my blog entry… Here are some of the most Og-unrelated purchases:

Once again, books I reviewed, positively or negatively, were among the top purchases… Like a dozen Monte Carlo simulation and resampling methods for social science , a few copies of Naked Statistics. And again a few of The Cartoon Introduction to Statistics. (Despite a most critical review.) Thanks to all of you using those links and feeding further my book addiction, with the drawback of inducing even more fantasy book reviews.

Filed under: Books, Kids, R, Statistics Tagged: Amazon, amazon associates, book reviews, dog life jacket, Monte Carlo Statistical Methods, Og
Categories: Bayesian Bloggers

the demise of the Bayes factor

Xian's Og - Sun, 2014-12-07 20:20

With Kaniav Kamary, Kerrie Mengersen, and Judith Rousseau, we have just arXived (and submitted) a paper entitled “Testing hypotheses via a mixture model”. (We actually presented some earlier version of this work in Cancũn, Vienna, and Gainesville, so you may have heard of it already.) The notion we advocate in this paper is to replace the posterior probability of a model or an hypothesis with the posterior distribution of the weights of a mixture of the models under comparison. That is, given two models under comparison,

we propose to estimate the (artificial) mixture model

and in particular derive the posterior distribution of α. One may object that the mixture model is neither of the two models under comparison but this is the case at the boundary, i.e., when α=0,1. Thus, if we use prior distributions on α that favour the neighbourhoods of 0 and 1, we should be able to see the posterior concentrate near 0 or 1, depending on which model is true. And indeed this is the case: for any given Beta prior on α, we observe a higher and higher concentration at the right boundary as the sample size increases. And establish a convergence result to this effect. Furthermore, the mixture approach offers numerous advantages, among which [verbatim from the paper]:

  • relying on a Bayesian estimator of the weight α rather than on the posterior probability of the corresponding model does remove the need of overwhelmingly artificial prior probabilities on model indices;
  •  the interpretation of this estimator is at least as natural as handling the posterior probability, while avoiding the caricaturesque zero-one loss setting. The quantity α and its posterior distribution provide a measure of proximity to both models for the data at hand, while being also interpretable as a propensity of the data to stand with (or to stem from) one of the two models. This representation further allows for alternative perspectives on testing and model choices, through the notions of predictive tools  cross-validation, and information indices like WAIC;
  • the highly problematic computation of the marginal likelihoods is bypassed, standard algorithms being available for Bayesian mixture estimation;
  • the extension to a finite collection of models to be compared is straightforward, as this simply involves a larger number of components. This approach further allows to consider all models at once rather than engaging in pairwise costly comparisons and thus to eliminate the least likely models by simulation, those being not explored by the corresponding algorithm;
  • the (simultaneously conceptual and computational) difficulty of “label switching” that plagues both Bayesian estimation and Bayesian computation for most mixture models completely vanishes in this particular context, since components are no longer exchangeable. In particular, we compute neither a Bayes factor nor a posterior probability related with the substitute mixture model and we hence avoid the difficulty of recovering the modes of the posterior distribution. Our perspective is solely centred on estimating the parameters of a mixture model where both components are always identifiable;
  • the posterior distribution of α evaluates more thoroughly the strength of the support for a given model than the single figure outcome of a Bayes factor or of a posterior probability. The variability of the posterior distribution on α allows for a more thorough assessment of the strength of the support of one model against the other;
  • an additional feature missing from traditional Bayesian answers is that a mixture model also acknowledges the possibility that, for a finite dataset, both models or none could be acceptable.
  • while standard (proper and informative) prior modelling can be painlessly reproduced in this novel setting, non-informative (improper)
    priors now are manageable therein, provided both models under comparison are first reparametrised towards common-meaning and shared parameters, as for instance with location and scale parameters. In the special case when all parameters can be made common to both models [While this may sound like an extremely restrictive requirement in a traditional mixture model, let us stress here that the presence of common parameters becomes quite natural within a testing setting. To wit, when comparing two different models for the same data, moments are defined in terms of the observed data and hence should be the same for both models. Reparametrising the models in terms of those common meaning moments does lead to a mixture model with some and maybe all common parameters. We thus advise the use of a common parametrisation, whenever possible.] the mixture model reads as

    For instance, if θ is a location parameter, a flat prior can be used with no foundational difficulty, in opposition to the testing case;

  • continuing from the previous argument, using the same parameters or some identical parameters on both components is an essential feature of this reformulation of Bayesian testing, as it highlights the fact that the opposition between the two components of the mixture is not an issue of enjoying different parameters, but quite the opposite. As further stressed below, this or even those common parameter(s) is (are) nuisance parameters that need be integrated out (as they also are in the traditional Bayesian approach through the computation of the marginal likelihoods);
  • the choice of the prior model probabilities is rarely discussed in a classical Bayesian approach, even though those probabilities linearly impact the posterior probabilities and can be argued to promote the alternative of using the Bayes factor instead. In the mixture estimation setting, prior modelling only involves selecting a prior on α, for instance a Beta B(a,a) distribution, with a wide range of acceptable values for the hyperparameter a. While the value of a impacts the posterior distribution of α, it can be argued that (a) it nonetheless leads to an accumulation of the mass near 1 or 0, i.e., to favour the most favourable or the true model over the other one, and (b) a sensitivity analysis on the impact of a is straightforward to carry on;
  • in most settings, this approach can furthermore be easily calibrated by a parametric bootstrap experiment providing a posterior distribution of α under each of the models under comparison. The prior predictive error can therefore be directly estimated and can drive the choice of the hyperparameter a, if need be.

Filed under: Books, Kids, Statistics, Travel, University life Tagged: Bayes factor, Bayesian hypothesis testing, component of a mixture, consistency, hyperparameter, model posterior probabilities, posterior, prior, testing as mixture estimation
Categories: Bayesian Bloggers

Statistics slides (5)

Xian's Og - Sat, 2014-12-06 18:14

Here is the fifth and last set of slides for my third year statistics course, trying to introduce Bayesian statistics in the most natural way and hence starting with… Rasmus’ socks and ABC!!! This is an interesting experiment as I have no idea how my students will react. Either they will see the point besides the anecdotal story or they’ll miss it (being quite unhappy so far about the lack of mathematical rigour in my course and exercises…). We only have two weeks left so I am afraid the concept will not have time to seep through!

Filed under: Books, Kids, Statistics, University life Tagged: Bayesian statistics, Don Rubin, HPD region, map, Paris, Université Paris Dauphine
Categories: Bayesian Bloggers

Whispers underground [book review]

Xian's Og - Fri, 2014-12-05 18:14

“Dr. Walid said that normal human variations were wide enough that you’d need samples of hundreds of subjects to test that. Thousands if you wanted a statistically significant answer.
Low sample size—one of the reasons why magic and science are hard to reconcile.”

This is the third volume in the Rivers of London series, brought back from Gainesville, and possibly the least successful (in my opinion). It indeed takes place underground and not only in the Underground and the underground sewers of London. Which is this literary trick that always irks me in fantasy novels, namely the sudden appearance of massive underground complex with unsuspected societies that are large and evolved enough to reach the Industrial Age. (Sorry if this is too much of a spoiler!)

“It was the various probability calculations that stuffed me—they always do. I’d have been a bad scientist.”

Not that everything is bad in this novel: I still like the massive infodump about London, the style and humour, the return of PC Lesley trying to get over the (literal) loss of her face, and the appearance of new characters. But the story itself, revolving about a murder investigation, is rather shallow and the (compulsory?) English policeman versus American cop competition is too contrived to be funny. Most of the major plot is hidden from this volume, unless there are clues I missed. (For instance, one death from a previous volume which seemed to get ignored at that time is finally explained here.) Definitely not the book to read on its own, as it still relates and borrow much from the previous volumes, but presumably one to read nonetheless as the next instalment, Broken homes.

Filed under: Books, pictures, Travel Tagged: Gainesville, Isaac Newton, London, PC Peter Grant, Thames, Underground
Categories: Bayesian Bloggers

nested sampling with a test

Xian's Og - Thu, 2014-12-04 18:14

On my way back from Warwick, I read through a couple preprints, including this statistical test for nested sampling algorithms by Johannes Buchner. As it happens, I had already read and commented it in July! However, without the slightest memory of it (sad, isn’t it?!), I focussed this time much more on the modification proposed to MultiNest than on the test itself, which is in fact a Kolmogorov-Smirnov test applied to a specific target function.

Indeed, when reading the proposed modification of Buchner, I thought of a modification to the modification that sounded more appealing. Without getting back  to defining nested sampling in detail, this algorithm follows a swarm of N particles within upper-level sets of the likelihood surface, each step requiring a new simulation above the current value of the likelihood. The remark that set me on this time was that we should exploit the fact that (N-1) particles were already available within this level set. And uniformly distributed herein. Therefore this particle cloud should be exploited as much as possible to return yet another particle distributed just as uniformly as the other ones (!). Buchner proposes an alternative to MultiNest based on a randomised version of the maximal distance to a neighbour and a ball centre picked at random (but not uniformly). But it would be just as feasible to draw a distance from the empirical cdf of the distances to the nearest neighbours or to the k-nearest neighbours. With some possible calibration of k. And somewhat more accurate, because this distribution represents the repartition of the particle within the upper-level set. Although I looked at it briefly in the [sluggish] metro from Roissy airport, I could not figure out a way to account for the additional point to be included in the (N-1) existing particles. That is, how to deform the empirical cdf of those distances to account for an additional point. Unless one included the just-removed particle, which is at the boundary of this upper-level set. (Or rather, which defines the boundary of this upper-level set.) I have no clear intuition as to whether or not this would amount to a uniform generation over the true upper-level set. But simulating from the distance distribution would remove (I think) the clustering effect mentioned by Buchner.

“Other priors can be mapped [into the uniform prior over the unit hypercube] using the inverse of the cumulative prior distribution.”

Hence another illustration of the addictive features of nested sampling! Each time I get back to this notion, a new understanding or reinterpretation comes to mind. In any case, an equally endless source of projects for Master students. (Not that I agree with the above quote, mind you!)

Filed under: Books, pictures, Statistics, Travel, University life Tagged: hypercube, Kolmogorov-Smirnov distance, Multinest, nested sampling, uniform distribution, University of Warwick
Categories: Bayesian Bloggers

reading classics (#1,2)

Xian's Og - Wed, 2014-12-03 18:14

Today was the second session of our Reading Classics Seminar for the academic year 2014-2015. I have not reported on this seminar so far because it has had starting problems, namely hardly any student present on the first classes and therefore several re-starts until we reach a small group of interested students. Actually, this is the final year for my TSI Master at Paris-Dauphine, as it will become integrated within the new MASH Master next year. The latter started this year and drew away half of our potential applicants, presumably because of the wider spectrum between machine-learning, optimisation, programming and a tiny bit of statistics… If we manage to salvage [within the new Master] our speciality of offering the only Bayesian Statistics training in France, this will not be a complete disaster!

Anyway, the first seminar was about the great 1939 Biometrika paper by Pitman about the best invariant estimator appearing magically as a Bayes estimator! Alas, the student did not grasp the invariance part and hence focussed on less relevant technical parts, which was not a great experience (and therefore led me to abstain from posting the slides here). The second paper was not on my list but was proposed by another student as of yesterday when he realised he was to present today! This paper, entitled “The Counter-intuitive Non-informative Prior for the Bernoulli Family”, was published in the Journal of Statistics Education in 2004 by Zu and Liu, I had not heard of the paper (or of the journal) previously and I do not think it is worth advertising any further as it gives a very poor entry to non-informative priors in the simplest of settings, namely for Bernoulli B(p) observations. Indeed, the stance of the paper is to define a non-informative prior as one returning the MLE of p as its posterior expectation (missing altogether the facts that such a definition is parameterisation-invariant and that, given the modal nature of the MLE, a posterior mode would be much more appropriate, leading to the uniform prior of p as a solution) and that the corresponding prior was made of two Dirac masses at 0 and 1! Which again misses several key points like defining properly convergence in a space of probability distributions and using an improper prior differently from a proper prior. Esp. since in the next section, the authors switch to Haldane’s prior being the Be(0,0) distribution..! A prior that cannot be used since the posterior is not defined when all the observations are identical. Certainly not a paper to make it to the list! (My student simply pasted pages from this paper as his slides and so I see again no point in reposting them here. )

Filed under: Books, Kids, Statistics, University life Tagged: Bernoulli distribution, classics, invariance, non-informative priors, Pitman, seminar, Université Paris Dauphine
Categories: Bayesian Bloggers

the Grumble distribution and an ODE

Xian's Og - Tue, 2014-12-02 18:14

As ‘Og’s readers may have noticed, I paid some recent visits to Cross Validated (although I find this too addictive to be sustainable on a long term basis!, and as already reported a few years ago frustrating at several levels from questions asked without any preliminary personal effort, to a lack of background material to understand hints towards the answer, to not even considering answers [once the homework due date was past?], &tc.). Anyway, some questions are nonetheless great puzzles, to with this one about the possible transformation of a random variable R with density

into a Gumble distribution. While the better answer is that it translates into a power law,


I thought using the S=R² transform could work but obtained a wrong sign in the pseudo-Gumble density

and then went into seeking another transform into a Gumbel rv T, which amounted to solve the differential equation

As I could not solve analytically the ODE, I programmed a simple Runge-Kutta numerical resolution as follows:

solvR=function(prec=10^3,maxz=1){ z=seq(1,maxz,le=prec) t=rep(1,prec) #t(1)=1 for (i in 2:prec) t[i]=t[i-1]+(z[i]-z[i-1])*exp(-z[i-1]+ exp(-z[i-1])+t[i-1]+exp(-t[i-1])) zold=z z=seq(.1/maxz,1,le=prec) t=c(t[-prec],t) for (i in (prec-1):1) t[i]=t[i+1]+(z[i]-z[i+1])*exp(-z[i+1]+ exp(-z[i+1])+t[i+1]+exp(-t[i+1])) return(cbind(c(z[-prec],zold),t)) }

Which shows that [the increasing] t(w) quickly gets too large for the function to be depicted. But this is a fairly useless result in that a transform of the original variable and of its parameter into an arbitrary distribution is always possible, given that  W above has a fixed distribution… Hence the pun on Gumble in the title.

Filed under: Books, Kids, R, Statistics, University life Tagged: cross validated, differential equation, forum, Gumble distribution, probability distribution, Runge-Kutta, StackExchange
Categories: Bayesian Bloggers

another instance of ABC?

Xian's Og - Mon, 2014-12-01 18:14

“These characteristics are (1) likelihood is not available; (2) prior information is available; (3) a portion of the prior information is expressed in terms of functionals of the model that cannot be converted into an analytic prior on model parameters; (4) the model can be simulated. Our approach depends on an assumption that (5) an adequate statistical model for the data are available.”

A 2009 JASA paper by Ron Gallant and Rob McCulloch, entitled “On the Determination of General Scientific Models With Application to Asset Pricing”, may have or may not have connection with ABC, to wit the above quote, but I have trouble checking whether or not this is the case.

The true (scientific) model parametrised by θ is replaced with a (statistical) substitute that is available in closed form. And parametrised by g(θ). [If you can get access to the paper, I’d welcome opinions about Assumption 1 therein which states that the intractable density is equal to a closed-form density.] And the latter is over-parametrised when compared with the scientific model. As in, e.g., a N(θ,θ²) scientific model versus a N(μ,σ²) statistical model. In addition, the prior information is only available on θ. However, this does not seem to matter that much since (a) the Bayesian analysis is operated on θ only and (b) the Metropolis approach adopted by the authors involves simulating a massive number of pseudo-observations, given the current value of the parameter θ and the scientific model, so that the transform g(θ) can be estimated by maximum likelihood over the statistical model. The paper suggests using a secondary Markov chain algorithm to find this MLE. Which is claimed to be a simulated annealing resolution (p.121) although I do not see the temperature decreasing. The pseudo-model is then used in a primary MCMC step.

Hence, not truly an ABC algorithm. In the same setting, ABC would use a simulated dataset the same size as the observed dataset, compute the MLEs for both and compare them. Faster if less accurate when Assumption 1 [that the statistical model holds for a restricted parametrisation] does not stand.

Another interesting aspect of the paper is about creating and using a prior distribution around the manifold η=g(θ). This clearly relates to my earlier query about simulating on measure zero sets. The paper does not bring a definitive answer, as it never simulates exactly on the manifold, but this constitutes another entry on this challenging problem…

Filed under: Statistics Tagged: ABC, ABC-MCMC, Kullback-Leibler divergence, measure theory, Metropolis-Hastings algorithms, pseudo-likelihood
Categories: Bayesian Bloggers

reflections on the probability space induced by moment conditions with implications for Bayesian Inference [discussion]

Xian's Og - Sun, 2014-11-30 18:14

[Following my earlier reflections on Ron Gallant’s paper, here is a more condensed set of questions towards my discussion of next Friday.]

“If one specifies a set of moment functions collected together into a vector m(x,θ) of dimension M, regards θ as random and asserts that some transformation Z(x,θ) has distribution ψ then what is required to use this information and then possibly a prior to make valid inference?” (p.4)

The central question in the paper is whether or not given a set of moment equations

(where both the Xi‘s and θ are random), one can derive a likelihood function and a prior distribution compatible with those. It sounds to me like a highly complex question since it implies the integral equation

must have a solution for all n’s. A related question that was also remanent with fiducial distributions is how on Earth (or Middle Earth) the concept of a random theta could arise outside Bayesian analysis. And another one is how could the equations make sense outside the existence of the pair (prior,likelihood). A question that may exhibit my ignorance of structural models. But which may also relate to the inconsistency of Zellner’s (1996) Bayesian method of moments as exposed by Geisser and Seidenfeld (1999).

For instance, the paper starts (why?) with the Fisherian example of the t distribution of

which is truly is a t variable when θ is fixed at the true mean value. Now, if we assume that the joint distribution of the Xi‘s and θ is such that this projection is a t variable, is there any other case than the Dirac mass on θ? For all (large enough) sample sizes n? I cannot tell and the paper does not bring [me] an answer either.

When I look at the analysis made in the abstraction part of the paper, I am puzzled by the starting point (17), where

since the lhs and rhs operate on different spaces. In Fisher’s example, x is an n-dimensional vector, while Z is unidimensional. If I apply blindly the formula on this example, the t density does not integrate against the Lebesgue measure in the n-dimension Euclidean space… If a change of measure allows for this representation, I do not see so much appeal in using this new measure and anyway wonder in which sense this defines a likelihood function, i.e. the product of n densities of the Xi‘s conditional on θ. To me this is the central issue, which remains unsolved by the paper.

Filed under: Books, Statistics, University life Tagged: Arnold Zellner, empirical likelihood, fiducial distribution, measure theory, method of moments, R.A. Fisher, structural model
Categories: Bayesian Bloggers

Moon over Soho [book review]

Xian's Og - Fri, 2014-11-28 18:14

A book from the pile I brought back from Gainesville. And the first I read, mostly during the trip back to Paris. Both because I was eager to see the sequel to Rivers of London and because it was short and easy to carry in a pocket.

“From the figures I have, I believe that two to three jazz musicians have died within twenty-four hours of playing a gig in the Greater London area in the last year.”
“I take it that’s statistically significant?

Moon over Soho is the second installment in the Peter Grant series by Ben Aaronovitch. It would not read well on its own as it takes over when Rivers of London stopped. Even though it reintroduces most of the rules of this magical universe. Most characters are back (except for the hostaged Beverly) and they are trying to cope with what happened in the first installment. The story is even more centred on jazz than in the first volume, with as a corollary, Peter Grant’s parents taking a more important part in the book. The recovering Leslie is hardly seen (for obvious reasons) and heard, which leaves a convenient hole in Grant’s sentimental life! The book also introduces a major magical villein who will undoubtedly figures in the incoming books. Another great story, even though the central plot has a highly predictable ending, and even more end of the ending, and some parts sound like repetitions of similar parts in the first volume. But the tone, the pace, the style, the humour, the luv’ of Lundun, all are there and so it is all that matters! (I again bemoan the missing map of London!)

Filed under: Books, Kids, Travel Tagged: Ben Aaronnovitch, book review, England, jazz, London, Moon over Soho, Peter Grant series, WW II
Categories: Bayesian Bloggers

Le Monde puzzle [#887quater]

Xian's Og - Thu, 2014-11-27 18:14

And yet another resolution of this combinatorics Le Monde mathematical puzzle: that puzzle puzzled many more people than usual! This solution is by Marco F, using a travelling salesman representation and existing TSP software.

N is a golden number if the sequence {1,2,…,N} can be reordered so that the sum of any consecutive pair is a perfect square. What are the golden numbers between 1 and 25?

For instance, take n=199, you should first calculate the “friends”. Save them on a symmetric square matrix:

m1 <- matrix(Inf, nrow=199, ncol=199) diag(m1) <- 0 for (i in 1:199) m1[i,friends[i]] <- 1

Export the distance matrix to a file (in TSPlib format):

library(TSP) tsp <- TSP(m1) tsp image(tsp) write_TSPLIB(tsp, "f199.TSPLIB")

And use a solver to obtain the results. The best solver for TSP is Concorde. There are online versions where you can submit jobs:

0 2 1000000 2 96 1000000 96 191 1000000 191 168 1000000 ...

The numbers of the solution are in the second column (2, 96, 191, 168…). And they are 0-indexed, so you have to add 1 to them:

3 97 192 169 155 101 188 136 120 49 176 148 108 181 143 113 112 84 37 63 18 31 33 88168 193 96 160 129 127 162 199 90 79 177 147 78 22 122 167 194 130 39 157 99 190 13491 198 58 23 41 128 196 60 21 100 189 172 152 73 183 106 38 131 125 164 197 59 110 146178 111 145 80 20 61 135 121 75 6 94 195166 123 133 156 69 52 144 81 40 9 72 184 12 24 57 87 82 62 19 45 76 180 109 116 173 151 74 26 95 161 163 126 43 153 17154 27 117 139 30 70 11 89 107 118 138 186103 66 159 165 124 132 93 28 8 17 32 45 44 77 179 182 142 83 86 14 50 175 114 55 141 115 29 92 104 185 71 10 15 34 27 42 154 170 191 98 158 67 102 187 137 119 25 56 65 35 46 150 174 51 13 68 53 47 149 140 85 36 64 105 16 48
Filed under: Books, Kids, R, Statistics, University life Tagged: Le Monde, mathematical puzzle, travelling salesman Concorde
Categories: Bayesian Bloggers

Le Monde puzzle [#887ter]

Xian's Og - Wed, 2014-11-26 18:14

Here is a graph solution to the recent combinatorics Le Monde mathematical puzzle, proposed by John Shonder:

N is a golden number if the sequence {1,2,…,N} can be reordered so that the sum of any consecutive pair is a perfect square. What are the golden numbers between 1 and 25?

Consider an undirected graph GN with N vertices labelled 1 through N. Draw an edge between vertices i and j if and only if i + j is a perfect square. Then N is golden if GN contains a Hamiltonian path — that is, if there is a connected path that visits all of the vertices exactly once.I wrote a program (using Mathematica, though I’m sure there must be an R library with similar functionality) that builds up G sequentially and checks at each step whether the graph contains a Hamiltonian path. The program starts with G1 — a single vertex and no edges. Then it adds vertex 2. G2 has no edges, so 2 isn’t golden.

Adding vertex 3, there is an edge between 1 and 3. But vertex 2 is unconnected, so we’re still not golden.

The results are identical to yours, but I imagine my program runs a bit faster. Mathematica contains a built-in function to test for the existence of a Hamiltonian path.

Some of the graphs are interesting. I include representations of G25 and G36. Note that G36 contains a Hamiltonian cycle, so you could arrange the integers 1 … 36 on a roulette wheel such that each consecutive pair adds to a perfect square.

A somewhat similar problem:

Call N a “leaden” number if the sequence {1,2, …, N} can be reordered so that the sum of any consecutive pair is a prime number. What are the leaden numbers between 1 and 100? What about an arrangement such that the absolute value of the difference between any two consecutive numbers is prime?

[The determination of the leaden numbers was discussed in a previous Le Monde puzzle post.]

Filed under: Books, Kids, Statistics, University life Tagged: graph theory, Hamiltonian path, Le Monde, Mathematica, mathematical puzzle
Categories: Bayesian Bloggers

Methodological developments in evolutionary genomic [3 years postdoc in Montpellier]

Xian's Og - Wed, 2014-11-26 08:18

[Here is a call for a post-doctoral position in Montpellier, South of France, not Montpelier, Vermont!, in a population genetics group with whom I am working. Highly recommended if you are currently looking for a postdoc!]

Three-year post-doctoral position at the Institute of Computational Biology (IBC), Montpellier (France) :
Methodological developments in evolutionary genomics.

One young investigator position opens immediately at the Institute for Computational Biology (IBC) of Montpellier (France) to work on the development of innovative inference methods and software in population genomics or phylogenetics to analyze large-scale genomic data in the fields of health, agronomy and environment (Work Package 2 « evolutionary genomics » of the IBC). The candidate will develop its own research on some of the following topics : selective processes, demographic history, spatial genetic processes, very large phylogenies reconstruction, gene/species tree reconciliation, using maximum likelihood, Bayesian and simulation-based inference. We are seeking a candidate with a strong background in mathematical and computational evolutionary biology, with interest in applications and software development. The successfull candidate will work on his own project, build in collaboration with any researcher involved in the WP2 project and working at the IBC labs (AGAP, CBGP, ISEM, I3M, LIRMM, MIVEGEC).

IBC hires young investigators, typically with a PhD plus some post-doc experience, a high level of publishing, strong communication abilities, and a taste for multidisciplinary research. Working full-time at IBC, these young researchers will play a key role in Institute life. Most of their time will be devoted to scientific projects. In addition, they are expected to actively participate in the coordination of workpackages, in the hosting of foreign researchers and in the organization of seminars and events (summer schools, conferences…). In exchange, these young researchers will benefit from an exceptional environment thanks to the presence of numerous leading international researchers, not to mention significant autonomy for their work. Montpellier hosts one of the most vibrant communities of biodiversity research in Europe with several research centers of excellence in the field. This positions is open for up to 3 years with a salary well above the French post-doc standards. Starting date is open to discussion.

 The application deadline is January 31, 2015.

Living in Montpellier:


Contacts at WP2 « Evolutionary Genetics » :


Jean-Michel Marin :

François Rousset :

Vincent Ranwez :

Olivier Gascuel :

Submit my application :

Filed under: pictures, Statistics, Travel, University life, Wines Tagged: academic position, Bayesian statistics, biodiversity, computational biology, France, Institut de Biologie Computationelle, Montpellier, phylogenetic models, position, postdoctoral position
Categories: Bayesian Bloggers

reflections on the probability space induced by moment conditions with implications for Bayesian Inference [refleXions]

Xian's Og - Tue, 2014-11-25 18:14

“The main finding is that if the moment functions have one of the properties of a pivotal, then the assertion of a distribution on moment functions coupled with a proper prior does permit Bayesian inference. Without the semi-pivotal condition, the assertion of a distribution for moment functions either partially or completely specifies the prior.” (p.1)

Ron Gallant will present this paper at the Conference in honour of Christian Gouréroux held next week at Dauphine and I have been asked to discuss it. What follows is a collection of notes I made while reading the paper , rather than a coherent discussion, to come later. Hopefully prior to the conference.

The difficulty I have with the approach presented therein stands as much with the presentation as with the contents. I find it difficult to grasp the assumptions behind the model(s) and the motivations for only considering a moment and its distribution. Does it all come down to linking fiducial distributions with Bayesian approaches? In which case I am as usual sceptical about the ability to impose an arbitrary distribution on an arbitrary transform of the pair (x,θ), where x denotes the data. Rather than a genuine prior x likelihood construct. But I bet this is mostly linked with my lack of understanding of the notion of structural models.

“We are concerned with situations where the structural model does not imply exogeneity of θ, or one prefers not to rely on an assumption of exogeneity, or one cannot construct a likelihood at all due to the complexity of the model, or one does not trust the numerical approximations needed to construct a likelihood.” (p.4)

As often with econometrics papers, this notion of structural model sets me astray: does this mean any latent variable model or an incompletely defined model, and if so why is it incompletely defined? From a frequentist perspective anything random is not a parameter. The term exogeneity also hints at this notion of the parameter being not truly a parameter, but including latent variables and maybe random effects. Reading further (p.7) drives me to understand the structural model as defined by a moment condition, in the sense that

has a unique solution in θ under the true model. However the focus then seems to make a major switch as Gallant considers the distribution of a pivotal quantity like

as induced by the joint distribution on (x,θ), hence conversely inducing constraints on this joint, as well as an associated conditional. Which is something I have trouble understanding, First, where does this assumed distribution on Z stem from? And, second, exchanging randomness of terms in a random variable as if it was a linear equation is a pretty sure way to produce paradoxes and measure theoretic difficulties.

The purely mathematical problem itself is puzzling: if one knows the distribution of the transform Z=Z(X,Λ), what does that imply on the joint distribution of (X,Λ)? It seems unlikely this will induce a single prior and/or a single likelihood… It is actually more probable that the distribution one arbitrarily selects on m(x,θ) is incompatible with a joint on (x,θ), isn’t it?

“The usual computational method is MCMC (Markov chain Monte Carlo) for which the best known reference in econometrics is Chernozhukov and Hong (2003).” (p.6)

While I never heard of this reference before, it looks like a 50 page survey and may be sufficient for an introduction to MCMC methods for econometricians. What I do not get though is the connection between this reference to MCMC and the overall discussion of constructing priors (or not) out of fiducial distributions. The author also suggests using MCMC to produce the MAP estimate but this always stroke me as inefficient (unless one uses our SAME algorithm of course).

“One can also compute the marginal likelihood from the chain (Newton and Raftery (1994)), which is used for Bayesian model comparison.” (p.22)

Not the best solution to rely on harmonic means for marginal likelihoods…. Definitely not. While the author actually uses the stabilised version (15) of Newton and Raftery (1994) estimator, which in retrospect looks much like a bridge sampling estimator of sorts, it remains dangerously close to the original [harmonic mean solution] especially for a vague prior. And it only works when the likelihood is available in closed form.

“The MCMC chains were comprised of 100,000 draws well past the point where transients died off.” (p.22)

I wonder if the second statement (with a very nice image of those dying transients!) is intended as a consequence of the first one or independently.

“A common situation that requires consideration of the notions that follow is that deriving the likelihood from a structural model is analytically intractable and one cannot verify that the numerical approximations one would have to make to circumvent the intractability are sufficiently accurate.” (p.7)

This then is a completely different business, namely that defining a joint distribution by mean of moment equations prevents regular Bayesian inference because the likelihood is not available. This is more exciting because (i) there are alternative available! From ABC to INLA (maybe) to EP to variational Bayes (maybe). And beyond. In particular, the moment equations are strongly and even insistently suggesting that empirical likelihood techniques could be well-suited to this setting. And (ii) it is no longer a mathematical worry: there exist a joint distribution on m(x,θ), induced by a (or many) joint distribution on (x,θ). So the question of finding whether or not it induces a single proper prior on θ becomes relevant. But, if I want to use ABC, being given the distribution of m(x,θ) seems to mean I can only generate new values of this transform while missing a natural distance between observations and pseudo-observations. Still, I entertain lingering doubts that this is the meaning of the study. Where does the joint distribution come from..?!

“Typically C is coarse in the sense that it does not contain all the Borel sets (…)  The probability space cannot be used for Bayesian inference”

My understanding of that part is that defining a joint on m(x,θ) is not always enough to deduce a (unique) posterior on θ, which is fine and correct, but rather anticlimactic. This sounds to be what Gallant calls a “partial specification of the prior” (p.9).

Overall, after this linear read, I remain very much puzzled by the statistical (or Bayesian) implications of the paper . The fact that the moment conditions are central to the approach would once again induce me to check the properties of an alternative approach like empirical likelihood.

Filed under: Statistics, University life Tagged: ABC, compatible conditional distributions, empirical likelihood, expectation-propagation, harmonic mean estimator, INLA, latent variable, MCMC, prior distributions, structural model, variational Bayes methods
Categories: Bayesian Bloggers

prayers and chi-square

Xian's Og - Mon, 2014-11-24 18:14

One study I spotted in Richard Dawkins’ The God delusion this summer by the lake is a study of the (im)possible impact of prayer over patient’s recovery. As a coincidence, my daughter got this problem in her statistics class of last week (my translation):

1802 patients in 6 US hospitals have been divided into three groups. Members in group A was told that unspecified religious communities would pray for them nominally, while patients in groups B and C did not know if anyone prayed for them. Those in group B had communities praying for them while those in group C did not. After 14 days of prayer, the conditions of the patients were as follows:

  • out of 604 patients in group A, the condition of 249 had significantly worsened;
  • out of 601 patients in group B, the condition of 289 had significantly worsened;
  • out of 597 patients in group C, the condition of 293 had significantly worsened.

 Use a chi-square procedure to test the homogeneity between the three groups, a significant impact of prayers, and a placebo effect of prayer.

This may sound a wee bit weird for a school test, but she is in medical school after all so it is a good way to enforce rational thinking while learning about the chi-square test! (Answers: [even though the data is too sparse to clearly support a decision, esp. when using the chi-square test!] homogeneity and placebo effect are acceptable assumptions at level 5%, while the prayer effect is not [if barely].)

Filed under: Books, Kids, Statistics, University life Tagged: binomial distribution, chi-square test, exercises, medical school, prayer, Richard Dawkins, The God Delusion
Categories: Bayesian Bloggers

an ABC experiment

Xian's Og - Sun, 2014-11-23 18:14


In a cross-validated forum exchange, I used the code below to illustrate the working of an ABC algorithm:

#normal data with 100 observations n=100 x=rnorm(n) #observed summaries sumx=c(median(x),mad(x)) #normal x gamma prior priori=function(N){ return(cbind(rnorm(N,sd=10), 1/sqrt(rgamma(N,shape=2,scale=5)))) } ABC=function(N,alpha=.05){ prior=priori(N) #reference table #pseudo-data summ=matrix(0,N,2) for (i in 1:N){ xi=rnorm(n)*prior[i,2]+prior[i,1] summ[i,]=c(median(xi),mad(xi)) #summaries } #normalisation factor for the distance mads=c(mad(summ[,1]),mad(summ[,2])) #distance dist=(abs(sumx[1]-summ[,1])/mads[1])+ (abs(sumx[2]-summ[,2])/mads[2]) #selection posterior=prior[dist<quantile(dist,alpha),]}

Hence I used the median and the mad as my summary statistics. And the outcome is rather surprising, for two reasons: the first one is that the posterior on the mean μ is much wider than when using the mean and the variance as summary statistics. This is not completely surprising in that the latter are sufficient, while the former are not. Still, the (-10,10) range on the mean is way larger… The second reason for surprise is that the true posterior distribution cannot be derived since the joint density of med and mad is unavailable.

After thinking about this for a while, I went back to my workbench to check the difference with using mean and variance. To my greater surprise, I found hardly any difference! Using the almost exact ABC with 10⁶ simulations and a 5% subsampling rate returns exactly the same outcome. (The first row above is for the sufficient statistics (mean,standard deviation) while the second row is for the (median,mad) pair.) Playing with the distance does not help. The genuine posterior output is quite different, as exposed on the last row of the above, using a basic Gibbs sampler since the posterior is not truly conjugate.

Filed under: Books, pictures, R, Statistics, University life Tagged: ABC, Gibbs sampling, MCMC, mean, median, median absolute deviation, Monte Carlo Statistical Methods, normal model, summary statistics
Categories: Bayesian Bloggers

Challis Lectures

Xian's Og - Sat, 2014-11-22 18:14


I had a great time during this short visit in the Department of Statistics, University of Florida, Gainesville. First, it was a major honour to be the 2014 recipient of the George H. Challis Award and I considerably enjoyed delivering my lectures on mixtures and on ABC with random forests, And chatting with members of the audience about the contents afterwards. Here is the physical award I brought back to my office:

More as a piece of trivia, here is the amount of information about the George H. Challis Award I found on the UF website:

This fund was established in 2000 by Jack M. and Linda Challis Gill and the Gill Foundation of Texas, in memory of Linda’s father, to support faculty and student conference travel awards and the George Challis Biostatistics Lecture Series. George H. Challis was born on December 8, 1911 and was raised in Italy and Indiana. He was the first cousin of Indiana composer Cole Porter. George earned a degree in 1933 from the School of Business at Indiana University in Bloomington. George passed away on May 6, 2000. His wife, Madeline, passed away on December 14, 2009.

Cole Porter, indeed!

On top of this lecturing activity, I had a full academic agenda, discussing with most faculty members and PhD students of the Department, on our respective research themes over the two days I was there and it felt like there was not enough time! And then, during the few remaining hours where I did not try to stay on French time (!), I had a great time with my friends Jim and Maria in Gainesville, tasting a fantastic local IPA beer from Cigar City Brewery and several great (non-local) red wines… Adding to that a pile of new books, a smooth trip both ways, and a chance encounter with Alicia in Atlanta airport, it was a brilliant extended weekend!

Filed under: Books, pictures, Statistics, Travel, University life, Wines Tagged: ABC, Cigar City Brewery, Cole Porter, finite mixtures, Florida, Gainesville, George H. Challis Award, random forests
Categories: Bayesian Bloggers