When playing with Peter Rossi’s bayesm R package during a visit of Jean-Michel Marin to Paris, last week, we came up with the above Gibbs outcome. The setting is a Gaussian mixture model with three components in dimension 5 and the prior distributions are standard conjugate. In this case, with 500 observations and 5000 Gibbs iterations, the Markov chain (for one component of one mean of the mixture) has two highly distinct regimes: one that revolves around the true value of the parameter, 2.5, and one that explores a much broader area (which is associated with a much smaller value of the component weight). What we found amazing is the Gibbs ability to entertain both regimes, simultaneously.
Filed under: Books, pictures, R, Statistics, University life Tagged: bayesm, convergence assessment, Gibbs sampler, Jean-Michel Marin, Markov chain Monte Carlo, mixtures, R
Filed under: Mountains, pictures, Travel Tagged: Bohemia, Czech Republic, Giant Mountains, Krkonošská magistrála, sunset, vacations, Špindlerův Mlýn
When preparing my OxWaSP projects a few weeks ago, I came perchance on a set of slides, entitled “Hierarchical models are not Bayesian“, written by Brian Dennis (University of Idaho), where the author argues against Bayesian inference in hierarchical models in ecology, much in relation with the previously discussed paper of Subhash Lele. The argument is the same, namely a possibly major impact of the prior modelling on the resulting inference, in particular when some parameters are hardly identifiable, the more when the model is complex and when there are many parameters. And that “data cloning” being available since 2007, frequentist methods have “caught up” with Bayesian computational abilities.
Let me remind the reader that “data cloning” means constructing a sequence of Bayes estimators corresponding to the data being duplicated (or cloned) once, twice, &tc., until the point estimator stabilises. Since this corresponds to using increasing powers of the likelihood, the posteriors concentrate more and more around the maximum likelihood estimator. And even recover the Hessian matrix. This technique is actually older than 2007 since I proposed it in the early 1990’s under the name of prior feedback, with earlier occurrences in the literature like D’Epifanio (1989) and even the discussion of Aitkin (1991). A more efficient version of this approach is the SAME algorithm we developed in 2002 with Arnaud Doucet and Simon Godsill where the power of the likelihood is increased during iterations in a simulated annealing version (with a preliminary version found in Duflo, 1996).
I completely agree with the author that a hierarchical model does not have to be Bayesian: when the random parameters in the model are analysed as sources of additional variations, as for instance in animal breeding or ecology, and integrated out, the resulting model can be analysed by any statistical method. Even though one may wonder at the motivations for selecting this particular randomness structure in the model. And at an increasing blurring between what is prior modelling and what is sampling modelling as the number of levels in the hierarchy goes up. This rather amusing set of slides somewhat misses a few points, in particular the ability of data cloning to overcome identifiability and multimodality issues. Indeed, as with all simulated annealing techniques, there is a practical difficulty in avoiding the fatal attraction of a local mode using MCMC techniques. There are thus high chances data cloning ends up in the “wrong” mode. Moreover, when the likelihood is multimodal, it is a general issue to decide which of the modes is most relevant for inference. In which sense is the MLE more objective than a Bayes estimate, then? Further, the impact of a prior on some aspects of the posterior distribution can be tested by re-running a Bayesian analysis with different priors, including empirical Bayes versions or, why not?!, data cloning, in order to understand where and why huge discrepancies occur. This is part of model building, in the end.
Filed under: Books, Kids, Statistics, University life Tagged: Bayes estimators, Bayesian foundations, data cloning, Idaho, maximum likelihood estimation, prior feedback, SAME algorithm, simulated annealing
Filed under: Mountains, Travel Tagged: Czech Republic, Giant Mountains, Krkonošská magistrála, ski resorts, sunset, vacations, Špindlerův Mlýn
Bayesian optimization for likelihood-free inference of simulator-based statistical models [guest post]
Here are some comments on the paper of Gutmann and Corander. My brief skim read through this concentrated on the second half of the paper, the applied methodology. So my comments should be quite complementary to Christian’s on the theoretical part!
ABC algorithms generally follow the template of proposing parameter values, simulating datasets and accepting/rejecting/weighting the results based on similarity to the observations. The output is a Monte Carlo sample from a target distribution, an approximation to the posterior. The most naive proposal distribution for the parameters is simply the prior, but this is inefficient if the prior is highly diffuse compared to the posterior. MCMC and SMC methods can be used to provide better proposal distributions. Nevertheless they often still seem quite inefficient, requiring repeated simulations in parts of parameter space which have already been well explored.
The strategy of this paper is to instead attempt to fit a non-parametric model to the target distribution (or in fact to a slight variation of it). Hopefully this will require many fewer simulations. This approach is quite similar to Richard Wilkinson’s recent paper. Richard fitted a Gaussian process to the ABC analogue of the log-likelihood. Gutmann and Corander introduce two main novelties:
- They model the expected discrepancy (i.e. distance) Δθ between the simulated and observed summary statistics. This is then transformed to estimate the likelihood. This is in contrast to Richard who transformed the discrepancy before modelling. This is the standard ABC approach of weighting the discrepancy depending on how close to 0 it is. The drawback of the latter approach is it requires picking a tuning parameter (the ABC acceptance threshold or bandwidth) in advance of the algorithm. The new approach still requires a tuning parameter but its choice can be delayed until the transformation is performed.
- They generate the θ values on-line using “Bayesian optimisation”. The idea is to pick θ to concentrate on the region near the minimum of the objective function, and also to reduce uncertainty in the Gaussian process. Thus well explored regions can usually be neglected. This is in contrast to Richard who chose θs using space filling design prior to performing any simulations.
I didn’t read the paper’s theory closely enough to decide whether (1) is a good idea. Certainly the results for the paper’s examples look convincing. Also, one issue with Richard‘s approach was that because the log-likelihood varied over such a wide variety of magnitudes, he needed to fit several “waves” of GPs. It would be nice to know if the approach of modelling the discrepancy has removed this problem, or if a single GP is still sometimes an insufficiently flexible model.
Novelty (2) is a very nice and natural approach to take here. I did wonder why the particular criterion in Equation (45) was used to decide on the next θ. Does this correspond to optimising some information theoretic quantity? Other practical questions were whether it’s possible to parallelise the method (I seem to remember talking to Michael Gutmann about this at NIPS but can’t remember his answer!), and how well the approach scales up with the dimension of the parameters.
Filed under: Books, Statistics, University life Tagged: ABC, arXiv, Dennis Prangle, dimension curse, Gaussian processes, guest post, NIPS, nonparametric probability density estimation
Filed under: Kids, pictures, Travel Tagged: Apostles, astronomical clock, Czech Republic, mechanical clock, old town, Prague
I read this book by Albert Camus over my week in Oxford, having found it on my daughter’s bookshelf (as she had presumably read it in high school…). It is a very special book in that (a) Camus was working on it when he died in a car accident, (b) the manuscript was found among the wreckage, and (c) it differs very much from Camus’ other books. Indeed, the book is partly autobiographical and written with an unsentimental realism that is raw and brutal. It describes the youth of Jacques, the son of French colons in Algiers, whose father had died in the first days of WW I and whose family lives in the uttermost poverty, with both his mother and grandmother doing menial jobs to simply survive. Thanks to a supportive teacher, he manages to get a grant to attend secondary school. What is most moving about the book is how Camus describes the numbing effects of poverty, namely how his relatives see their universe shrinking so much that notions like the Mother Country (France) or books loose meaning for them. Without moving them towards or against native Algerians, who never penetrate the inner circles in the novel, moving behind a sort of glass screen. It is not that the tensions and horrors of the colonisation and of the resistance to colonisation are hidden, quite the opposite, but the narrator considers those with a sort of fatalism without questioning the colonisation itself. (The book reminded me very much of my grand-father‘s childhood, with a father also among the dead soldiers of WW I, being raised by a single mother in harsh conditions. With the major difference that my grandfather decided to stop school very early to become a gardener…) There are also obvious parallels with Pagnol’s autobiographical novels like My Father’s Glory, written at about the same time, from the boy friendship to the major role of the instituteur, to the hunting party, to the funny uncle, but everything opposes the two authors, from Pagnol light truculence to Camus’ tragic depiction. Pagnol’s books are great teen books (and I still remember my mother buying the first one on a vacation road trip) but nothing more. Camus’ book could have been his greatest book, had he survived the car accident of January 1960.
Filed under: Books, Kids, pictures, Travel Tagged: Albert Camus, Algeria, Algerian colons, Algiers, book review, Carlos Sampayo, hunting, José Muñoz, Marcel Pagnol, The First Man, WW I
In the common room of the Department of Mathematics at the University of Warwick [same building as the Department of Statistics], there is a box for book exchanges and I usually take a look at each visit for a possible exchange. In October, I thus picked Jo Nesbø’s The Redbreast in exchange for maybe The Rogue Male. However, it stood on my office bookcase for another three months before I found time to read this early (2000) instalment in the Harry Hole series. With connections with the earliest Redeemer.
This is a fairly good if not perfect book, with a large opening into Norway’s WW II history and the volunteers who joined Nazi Germany to fight on the Eastern Front. And the collaborationist government of Vidkin Quissling. I found most interesting this entry into this period and the many parallels with French history at the same time. (To the point that quisling is now a synonym for collaborator, similar to pétainiste in French.) This historical background has some similarities with Camilla Lackberg‘s Hidden Child I read a while ago but on a larger and broader scale. Reminiscences and episodes from 1940-1944 take a large part of the book. And rightly so, as the story during WW II explains a lot of the current plot. While this may sound like an easy story-line, the plot also dwells a lot on skinheads and neo-Nazis in Olso. While Hole’s recurrent alcoholism irks me in the long run (more than Rebus‘ own alcohol problem, for some reason!), the construction of the character is quite well-done, along with a reasonable police force, even though both Hole’s inquest and the central crime of the story are stretching on and beyond belief, with too many coincidences. And a fatal shot by the police leads to very little noise and investigation, in a country where the murder rate is one of the lowest in the World and police officers do not carry guns. Except in Nesbø’s novels! Still, I did like the novel to the point of spending most of a Sunday afternoon on it, with the additional appeal of most of it taking place in Oslo. Definitely a page turner.
Filed under: Books, Travel, University life Tagged: book rev, Norway, Oslo, Pétain, pétainiste, Quissling, Rødstrupe, WW II
In what could have been the most expensive raclette ever, I almost get rid of my oven! Last weekend, to fight the ongoing cold wave, we decided to have a raclette with mountain cheese and potatoes, but the raclette machine (mostly a resistance to melt the cheese) had an electric issue and kept blowing the meter. We then decided to use the over to melt the cheese but, while giving all signs of working, it would not heat. Rather than a cold raclette, we managed with the microwave (!), but I though the oven had blown as well. The next morning, I still checked on the web for similar accidents and found the explanation: by pressing the proper combination of buttons, we had succeeded to switch the over into the demo mode, used by shops to run the oven with no heating. The insane part of this little [very little] story is that nowhere in the manual appeared any indication of an existing demo mode and of a way of getting back to normal! After pushing combinations of buttons at random, I eventually got the solution and the oven is again working, instead of standing in the recycling bin.
Filed under: Kids, Wines Tagged: cooking, electronics, kitchen, manual, oven, raclette
“En fait, il y a deux pays qui ont envahi la Russie, c’est la France et l’Allemagne…”
Hartig et al. published a while ago (2011) a paper in Ecology Letters entitled “Statistical inference for stochastic simulation models – theory and application”, which is mostly about ABC. (Florian Hartig pointed out the paper to me in a recent blog comment. about my discussion of the early parts of Guttman and Corander’s paper.) The paper is largely a tutorial and it reminds the reader about related methods like indirect inference and methods of moments. The authors also insist on presenting ABC as a particular case of likelihood approximation, whether non-parametric or parametric. Making connections with pseudo-likelihood and pseudo-marginal approaches. And including a discussion of the possible misfit of the assumed model, handled by an external error model. And also introducing the notion of informal likelihood (which could have been nicely linked with empirical likelihood). A last class of approximations presented therein is called rejection filters and reminds me very much of Ollie Ratman’s papers.
“Our general aim is to find sufficient statistics that are as close to minimal sufficiency as possible.” (p.819)
As in other ABC papers, and as often reported on this blog, I find the stress on sufficiency a wee bit too heavy as those models calling for approximation almost invariably do not allow for any form of useful sufficiency. Hence the mathematical statistics notion of sufficiency is mostly useless in such settings.
“A basic requirement is that the expectation value of the point-wise approximation of p(Sobs|φ) must be unbiased” (p.823)
As stated above the paper is mostly in tutorial mode, for instance explaining what MCMC and SMC methods are. As illustrated by the above figure. There is however a final and interesting discussion section on the impact of estimating the likelihood function at different values of the parameter. However, the authors seem to focus solely on pseudo-marginal results to validate this approximation, hence on unbiasedness, which does not work for most ABC approaches that I know. And for the approximations listed in the survey. Actually, it would be quite beneficial to devise a cheap tool to assess the bias or extra-variation due to the use of approximative techniques like ABC… A sort of 21st Century bootstrap?!
Filed under: Books, Statistics, University life Tagged: ABC, ABC validation, Bayesian optimisation, non-parametrics, sufficiency, synthetic likelihood
Subhash Lele recently arXived a short paper entitled “Is non-informative Bayesian analysis appropriate for wildlife management: survival of San Joaquin Kit fox and declines in amphibian populations”. (Lele has been mentioned several times on this blog in connection with his data-cloning approach that mostly clones our own SAME algorithm.)
“The most commonly used non-informative priors are either the uniform priors or the priors with very large variances spreading the probability mass almost uniformly over the entire parameter space.”
The main goal of the paper is to warn, even better “to disabuse the ecologists of the notion that there is no difference between non-informative Bayesian inference and likelihood-based inference and that the philosophical underpinnings of statistical inference are irrelevant to practice.” The argument advanced by Lele is simply that two different parametrisations should lead to two compatible priors and that, if they do not not, this exhibits an unacceptable impact of the prior modelling on the resulting inference, while likelihood-based inference [obviously] does not depend on parametrisation.
The first example in the paper is a dynamic linear model of a fox population series when using a uniform U(0,1) prior on a parameter b against a Ga(100,100) prior on -a/b. (The normal prior a is the same on both.) I do not find the opposition between the two posteriors in the least surprising as the modelling starts by assuming different supports on the parameter b. And both are highly “informative” in that there is no intrinsic constraint on b that could justify the (0,1) support, as illustrated by the second choice when b is unconstrained, varying on (-15,15) or (-0.0015,0.0015) depending on how the Ga(100,100) prior is parametrised.
and the paper opposes a uniform prior on p,q to a normal N(0,10^3) prior on the logit transforms of p and q. [With an obvious typo at the top of page 10.] As shown on the above graph, the two priors on p are immensely different, so should lead to different posteriors in a weakly informative setting as a Bernoulli experiment. Even with a few hundred individuals. A somewhat funny aspect of this study is that Lele opposes the uniform prior to the Jeffreys Be(.5,.5) prior as being “nowhere close to looking like what one would consider a non-informative prior”, without noticing that the logit parametrisation normal prior leads to an even more peaked prior…
“Even when Jeffreys prior can be computed, it will be difficult to sell this prior as an objective prior to the jurors or the senators on the committee. The construction of Jeffreys and other objective priors for multi-parameter models poses substantial mathematical difficulties.”
I find it rather surprising that a paper can be dedicated to the comparison of two arbitrary prior distributions on two fairly simplistic models towards the global conclusion that “non-informative priors neither ‘let the data speak’ nor do they correspond (even roughly) to likelihood analysis.” In this regard, the earlier critical analysis of Seaman et al., to which my PhD student Kaniav Kamary and I replied, had a broader scope.
Filed under: Books, pictures, Statistics, University life Tagged: data cloning, non-informative priors, SAME algorithm
A question on Cross Validated led me to realise I had never truly considered the issue of periodic Gibbs samplers! In MCMC, non-aperiodic chains are a minor nuisance in that the skeleton trick of randomly subsampling the Markov chain leads to a aperiodic Markov chain. (The picture relates to the skeleton!) Intuitively, while the systematic Gibbs sampler has a tendency to non-reversibility, it seems difficult to imagine a sequence of full conditionals that would force the chain away from the current value..!In the discrete case, given that the current state of the Markov chain has positive probability for the target distribution, the conditional probabilities are all positive as well and hence the Markov chain can stay at its current value after one Gibbs cycle, with positive probabilities, which means strong aperiodicity. In the continuous case, a similar argument applies by considering a neighbourhood of the current value. (Incidentally, the same person asked a question about the absolute continuity of the Gibbs kernel. Being confused by our chapter on the topic!!!)
Filed under: Books, Kids, pictures, Statistics, Travel, University life Tagged: aperiodicity, convergence, cross validated, Gibbs sampler, Markov chain, MCMC algorithms, Monte Carlo Statistical Methods, skeleton chain
A study [re]published three days ago in both The New York Times and the BBC The Guardian reproduced the conclusion of an article in the Journal of the American College of Cardiology that strenuous and long-distance jogging (or more appropriately running) could have a negative impact on longevity! And that the best pace is around 8km/h, just above a brisk walk! Quite depressing… However, this was quickly followed by other articles, including this one in The New York Times, pointing out the lack of statistical validation in the study and the ridiculously small number of runners in the study. I am already feeling better (and ready for my long run tomorrow morning!), but appalled all the same by the lack of standards of journals publishing statistically void studies. I know, nothing new there…
Filed under: Running, Statistics Tagged: long distance running, medical studies, running injury, statistical significance
The University of Warwick is one of the five UK Universities (Cambridge, Edinburgh, Oxford, Warwick and UCL) to be part of the new Alan Turing Institute.To quote from the University press release, “The Institute will build on the UK’s existing academic strengths and help position the country as a world leader in the analysis and application of big data and algorithm research. Its headquarters will be based at the British Library at the centre of London’s Knowledge Quarter.” The Institute will gather researchers from mathematics, statistics, computer sciences, and connected fields towards collegial and focussed research , which means in particular that it will hire a fairly large number of researchers in stats and machine-learning in the coming months. The Department of Statistics at Warwick was strongly involved in answering the call for the Institute and my friend and colleague Mark Girolami will the University leading figure at the Institute, alas meaning that we will meet even less frequently! Note that the call for the Chair of the Alan Turing Institute is now open, with deadline on March 15. [As a personal aside, I find the recognition that Alan Turing’s genius played a pivotal role in cracking the codes that helped us win the Second World War. It is therefore only right that our country’s top universities are chosen to lead this new institute named in his honour. by the Business Secretary does not absolve the legal system that drove Turing to suicide….]
Filed under: Books, pictures, Running, Statistics, University life Tagged: Alan Turing, Alan Turing Institute, British Library, London, UCL, United Kingdom, University of Cambridge, University of Edinburgh, University of Oxford, University of Warwick
This (early) summer, a conference on missing data will be organised in Rennes, Brittany, with the support of the French Statistical Society [SFDS]. (Check the website if interested, Rennes is a mere two hours from Paris by fast train.)
Filed under: R, Statistics, Travel, University life Tagged: Brittany, conference, France, missing data, Rennes, Roderick Little, TGV
I just arXived my comments about A. Ronald Gallant’s “Reflections on the Probability Space Induced by Moment Conditions with Implications for Bayesian Inference”, capitalising on the three posts I wrote around the discussion talk I gave at the 6th French Econometrics conference last year. Nothing new there, except that I may get a response from Ron Gallant as this is submitted as a discussion of his related paper in Journal of Financial Econometrics. While my conclusion is rather negative, I find the issue of setting prior and model based on a limited amount of information of much interest, with obvious links with ABC, empirical likelihood and other approximation methods.
Filed under: pictures, Statistics, University life Tagged: 6th French Econometrics conference, ABC, empirical likelihood, limited information inference, measure theory, moment prior, Ron Gallant
An arithmetics Le Monde mathematical puzzle:
For which n’s are the averages of the first n squared integers integers? Among those, which ones are perfect squares?
An easy R code, for instancen=10^3 car=as.integer(as.integer(1:n)^2) sumcar=as.integer((cumsum(car)%/%as.integer(1:n))) diff=as.integer(as.integer(cumsum(car))-as.integer(1:n)*sumcar) print((1:n)[diff==00])
which produces 333 values 1 5 7 11 13 17 19 23 25 29 31 35 37 41 43 47 49 53  55 59 61 65 67 71 73 77 79 83 85 89 91 95 97 101 103 107  109 113 115 119 121 125 127 131 133 137 139 143 145 149 151 155 157 161  163 167 169 173 175 179 181 185 187 191 193 197 199 203 205 209 211 215  217 221 223 227 229 233 235 239 241 245 247 251 253 257 259 263 265 269  271 275 277 281 283 287 289 293 295 299 301 305 307 311 313 317 319 323  325 329 331 335 337 341 343 347 349 353 355 359 361 365 367 371 373 377  379 383 385 389 391 395 397 401 403 407 409 413 415 419 421 425 427 431  433 437 439 443 445 449 451 455 457 461 463 467 469 473 475 479 481 485  487 491 493 497 499 503 505 509 511 515 517 521 523 527 529 533 535 539  541 545 547 551 553 557 559 563 565 569 571 575 577 581 583 587 589 593  595 599 601 605 607 611 613 617 619 623 625 629 631 635 637 641 643 647  649 653 655 659 661 665 667 671 673 677 679 683 685 689 691 695 697 701  703 707 709 713 715 719 721 725 727 731 733 737 739 743 745 749 751 755  757 761 763 767 769 773 775 779 781 785 787 791 793 797 799 803 805 809  811 815 817 821 823 827 829 833 835 839 841 845 847 851 853 857 859 863  865 869 871 875 877 881 883 887 889 893 895 899 901 905 907 911 913 917  919 923 925 929 931 935 937 941 943 947 949 953 955 959 961 965 967 971  973 977 979 983 985 989 991 995 997
which are made of all odd integers that are not multiple of 3. (I could have guessed the exclusion of even numbers since the numerator is always odd. Why are the triplets excluded, now?! Jean-Louis Fouley gave me the answer: the sum of squares is such that
and hence m must be odd and 2m+1 a multiple of 3, which excludes multiples of 3.)
with the final result> sum(scar==0)  2 > ((1:n)[diff==0])[scar==0]  1 337
since 38025=195² is a perfect square. (I wonder if there is a plain explanation for that result!)
Filed under: Books, Kids, Statistics, University life Tagged: arithmetics, Jean-Louis Fouley, Le Monde, mathematical puzzle, perfect square, R
[Warning: post of limited interest to most, about a local race I ran for another year!]
Once more, I managed to run my annual 5k in Malakof. And once again being (barely) there on the day of the race. Having landed a few hours earlier from Birmingham. Due to traffic and road closures, I arrived very later in Malakoff and could not warm up as usual, or even squeeze to the first rows on the starting line. Given those handicaps, I still managed in getting close to my best time of last year (18:40 vs. 18:36). I alas finished second in my V2 category, just a few meters behind the first V2 and definitely catching up on him! My INSEE Paris Club team won the company challenge for yet another year. Repeating a pattern of now many years.
Filed under: Running Tagged: 5K, groundhog day, Insee Paris Club, Malakoff, veteran (V2)