Bayesian News Feeds

Is non-informative Bayesian analysis dangerous for wildlife???

Xian's Og - Wed, 2015-02-11 18:15

Subhash Lele recently arXived a short paper entitled “Is non-informative Bayesian analysis appropriate for wildlife management: survival of San Joaquin Kit fox and declines in amphibian populations”. (Lele has been mentioned several times on this blog in connection with his data-cloning approach that mostly clones our own SAME algorithm.)

“The most commonly used non-informative priors are either the uniform priors or the priors with very large variances spreading the probability mass almost uniformly over the entire parameter space.”

The main goal of the paper is to warn, even better “to disabuse the ecologists of the notion that there is no difference between non-informative Bayesian inference and likelihood-based inference and that the philosophical underpinnings of statistical inference are irrelevant to practice.” The argument advanced by Lele is simply that two different parametrisations should lead to two compatible priors and that, if they do not not, this exhibits an unacceptable impact of the prior modelling on the resulting inference, while likelihood-based inference [obviously] does not depend on parametrisation.

The first example in the paper is a dynamic linear model of a fox population series when using a uniform U(0,1) prior on a parameter b against a Ga(100,100) prior on -a/b. (The normal prior a is the same on both.) I do not find the opposition between the two posteriors in the least surprising as the modelling starts by assuming different supports on the parameter b. And both are highly “informative” in that there is no intrinsic constraint on b that could justify the (0,1) support, as illustrated by the second choice when b is unconstrained, varying on (-15,15) or (-0.0015,0.0015) depending on how the Ga(100,100) prior is parametrised.

The second model is even simpler as it involves one Bernoulli probability p for the observations, plus a second Bernoulli driving replicates when the first Bernoulli variate is one, i.e.,

and the paper opposes a uniform prior on p,q to a normal N(0,10^3) prior on the logit transforms of p and q. [With an obvious typo at the top of page 10.] As shown on the above graph, the two priors on p are immensely different, so should lead to different posteriors in a weakly informative setting as a Bernoulli experiment. Even with a few hundred individuals. A somewhat funny aspect of this study is that Lele opposes the uniform prior to the Jeffreys Be(.5,.5) prior as being “nowhere close to looking like what one would consider a non-informative prior”, without noticing that the logit parametrisation normal prior leads to an even more peaked prior…

“Even when Jeffreys prior can be computed, it will be difficult to sell this prior as an objective prior to the jurors or the senators on the committee. The construction of Jeffreys and other objective priors for multi-parameter models poses substantial mathematical difficulties.”

I find it rather surprising that a paper can be dedicated to the comparison of two arbitrary prior distributions on two fairly simplistic models towards the global conclusion that “non-informative priors neither ‘let the data speak’ nor do they correspond (even roughly) to likelihood analysis.” In this regard, the earlier critical analysis of Seaman et al., to which my PhD student Kaniav Kamary and I replied, had a broader scope.

Filed under: Books, pictures, Statistics, University life Tagged: data cloning, non-informative priors, SAME algorithm
Categories: Bayesian Bloggers

aperiodic Gibbs sampler

Xian's Og - Tue, 2015-02-10 18:15

A question on Cross Validated led me to realise I had never truly considered the issue of periodic Gibbs samplers! In MCMC, non-aperiodic chains are a minor nuisance in that the skeleton trick of randomly subsampling the Markov chain leads to a aperiodic Markov chain. (The picture relates to the skeleton!)  Intuitively, while the systematic Gibbs sampler has a tendency to non-reversibility, it seems difficult to imagine a sequence of full conditionals that would force the chain away from the current value..!In the discrete case, given that the current state of the Markov chain has positive probability for the target distribution, the conditional probabilities are all positive as well and hence the Markov chain can stay at its current value after one Gibbs cycle, with positive probabilities, which means strong aperiodicity. In the continuous case, a similar argument applies by considering a neighbourhood of the current value. (Incidentally, the same person asked a question about the absolute continuity of the Gibbs kernel. Being confused by our chapter on the topic!!!)

Filed under: Books, Kids, pictures, Statistics, Travel, University life Tagged: aperiodicity, convergence, cross validated, Gibbs sampler, Markov chain, MCMC algorithms, Monte Carlo Statistical Methods, skeleton chain
Categories: Bayesian Bloggers

should I run less?!

Xian's Og - Tue, 2015-02-10 08:18

A study [re]published three days ago in both The New York Times and the BBC The Guardian reproduced the conclusion of an article in the Journal of the American College of Cardiology that strenuous and long-distance jogging (or more appropriately running) could have a negative impact on longevity! And that the best pace is around 8km/h, just above a brisk walk! Quite depressing… However, this was quickly followed by other articles, including this one in The New York Times, pointing out the lack of statistical validation in the study and the ridiculously small number of runners in the study. I am already feeling  better (and ready for my long run tomorrow morning!), but appalled all the same by the lack of standards of journals publishing statistically void studies. I know, nothing new there…

Filed under: Running, Statistics Tagged: long distance running, medical studies, running injury, statistical significance
Categories: Bayesian Bloggers

Alan Turing Institute

Xian's Og - Mon, 2015-02-09 18:15


The University of Warwick is one of the five UK Universities (Cambridge, Edinburgh, Oxford, Warwick and UCL) to be part of the new Alan Turing Institute.To quote from the University press release,  “The Institute will build on the UK’s existing academic strengths and help position the country as a world leader in the analysis and application of big data and algorithm research. Its headquarters will be based at the British Library at the centre of London’s Knowledge Quarter.” The Institute will gather researchers from mathematics, statistics, computer sciences, and connected fields towards collegial and focussed research , which means in particular that it will hire a fairly large number of researchers in stats and machine-learning in the coming months. The Department of Statistics at Warwick was strongly involved in answering the call for the Institute and my friend and colleague Mark Girolami will the University leading figure at the Institute, alas meaning that we will meet even less frequently! Note that the call for the Chair of the Alan Turing Institute is now open, with deadline on March 15. [As a personal aside, I find the recognition that Alan Turing’s genius played a pivotal role in cracking the codes that helped us win the Second World War. It is therefore only right that our country’s top universities are chosen to lead this new institute named in his honour. by the Business Secretary does not absolve the legal system that drove Turing to suicide….]

Filed under: Books, pictures, Running, Statistics, University life Tagged: Alan Turing, Alan Turing Institute, British Library, London, UCL, United Kingdom, University of Cambridge, University of Edinburgh, University of Oxford, University of Warwick
Categories: Bayesian Bloggers

MissData 2015 in Rennes [June 18-19]

Xian's Og - Mon, 2015-02-09 08:18

This (early) summer, a conference on missing data will be organised in Rennes, Brittany, with the support of the French Statistical Society [SFDS]. (Check the website if interested, Rennes is a mere two hours from Paris by fast train.)

Filed under: R, Statistics, Travel, University life Tagged: Brittany, conference, France, missing data, Rennes, Roderick Little, TGV
Categories: Bayesian Bloggers

comments on reflections

Xian's Og - Sun, 2015-02-08 18:15

I just arXived my comments about A. Ronald Gallant’s “Reflections on the Probability Space Induced by Moment Conditions with Implications for Bayesian Inference”, capitalising on the three posts I wrote around the discussion talk I gave at the 6th French Econometrics conference last year. Nothing new there, except that I may get a response from Ron Gallant as this is submitted as a discussion of his related paper in Journal of Financial Econometrics. While my conclusion is rather negative, I find the issue of setting prior and model based on a limited amount of information of much interest, with obvious links with ABC, empirical likelihood and other approximation methods.

Filed under: pictures, Statistics, University life Tagged: 6th French Econometrics conference, ABC, empirical likelihood, limited information inference, measure theory, moment prior, Ron Gallant
Categories: Bayesian Bloggers

Le Monde puzzle [#899]

Xian's Og - Sat, 2015-02-07 18:15

An arithmetics Le Monde mathematical puzzle:

For which n’s are the averages of the first n squared integers integers? Among those, which ones are perfect squares?

An easy R code, for instance

n=10^3 car=as.integer(as.integer(1:n)^2) sumcar=as.integer((cumsum(car)%/%as.integer(1:n))) diff=as.integer(as.integer(cumsum(car))-as.integer(1:n)*sumcar) print((1:n)[diff==00])

which produces 333 values

[1] 1 5 7 11 13 17 19 23 25 29 31 35 37 41 43 47 49 53 [19] 55 59 61 65 67 71 73 77 79 83 85 89 91 95 97 101 103 107 [37] 109 113 115 119 121 125 127 131 133 137 139 143 145 149 151 155 157 161 [55] 163 167 169 173 175 179 181 185 187 191 193 197 199 203 205 209 211 215 [73] 217 221 223 227 229 233 235 239 241 245 247 251 253 257 259 263 265 269 [91] 271 275 277 281 283 287 289 293 295 299 301 305 307 311 313 317 319 323 [109] 325 329 331 335 337 341 343 347 349 353 355 359 361 365 367 371 373 377 [127] 379 383 385 389 391 395 397 401 403 407 409 413 415 419 421 425 427 431 [145] 433 437 439 443 445 449 451 455 457 461 463 467 469 473 475 479 481 485 [163] 487 491 493 497 499 503 505 509 511 515 517 521 523 527 529 533 535 539 [181] 541 545 547 551 553 557 559 563 565 569 571 575 577 581 583 587 589 593 [199] 595 599 601 605 607 611 613 617 619 623 625 629 631 635 637 641 643 647 [217] 649 653 655 659 661 665 667 671 673 677 679 683 685 689 691 695 697 701 [235] 703 707 709 713 715 719 721 725 727 731 733 737 739 743 745 749 751 755 [253] 757 761 763 767 769 773 775 779 781 785 787 791 793 797 799 803 805 809 [271] 811 815 817 821 823 827 829 833 835 839 841 845 847 851 853 857 859 863 [289] 865 869 871 875 877 881 883 887 889 893 895 899 901 905 907 911 913 917 [307] 919 923 925 929 931 935 937 941 943 947 949 953 955 959 961 965 967 971 [325] 973 977 979 983 985 989 991 995 997

which are made of all odd integers that are not multiple of 3. (I could have guessed the exclusion of even numbers since the numerator is always odd. Why are the triplets excluded, now?! Jean-Louis Fouley gave me the answer: the sum of squares is such that

and hence m must be odd and 2m+1 a multiple of 3, which excludes multiples of 3.)

The second part is as simple:

sole=sumcar[(1:n)[diff==0]] scar=as.integer(as.integer(sqrt(sole))^2)-sole sum(scar==0)

with the final result

> sum(scar==0) [1] 2 > ((1:n)[diff==0])[scar==0] [1] 1 337

since  38025=195² is a perfect square. (I wonder if there is a plain explanation for that result!)

Filed under: Books, Kids, Statistics, University life Tagged: arithmetics, Jean-Louis Fouley, Le Monde, mathematical puzzle, perfect square, R
Categories: Bayesian Bloggers

41ièmes Foulées de Malakoff [5k, 7⁰C, 18:40, 40th & 2nd V2]

Xian's Og - Fri, 2015-02-06 18:15

[Warning: post of limited interest to most, about a local race I ran for another year!]

Once more, I managed to run my annual 5k in Malakof. And once again being (barely) there on the day of the race. Having landed a few hours earlier from Birmingham. Due to traffic and road closures, I arrived very later in Malakoff and could not warm up as usual, or even squeeze to the first rows on the starting line. Given those handicaps, I still managed in getting close to my best time of last year (18:40 vs. 18:36). I alas finished second in my V2 category, just a few meters behind the first V2 and definitely catching up on him! My INSEE Paris Club team won the company challenge for yet another year. Repeating a pattern of now many years.

Filed under: Running Tagged: 5K, groundhog day, Insee Paris Club, Malakoff, veteran (V2)
Categories: Bayesian Bloggers

the ultimate argument

Xian's Og - Fri, 2015-02-06 13:18

In a tribune published on February 4 in Le Monde [under the vote-fishing argument that the National Front is not a threat for democracy], the former minister [and convicted member of fascist groups in the 1960’s] Gérard Longuet wrote this unforgettable sentence about the former and current heads of the National Front:

“Sa fille, elle, a compris, et d’ailleurs pourquoi serait-elle son père, alors que deux ou trois générations les séparent.”

[Translation:  His daughter has for her part well understood and in any case why should she be her father when there are two or three generations between them.]

Filed under: Statistics Tagged: French politics, Human Genetics, Le Monde, National Front
Categories: Bayesian Bloggers

Bayesian computation: fore and aft

Xian's Og - Thu, 2015-02-05 18:15

With my friends Peter Green (Bristol), Krzysztof Łatuszyński (Warwick) and Marcello Pereyra (Bristol), we just arXived the first version of “Bayesian computation: a perspective on the current state, and sampling backwards and forwards”, which first title was the title of this post. This is a survey of our own perspective on Bayesian computation, from what occurred in the last 25 years [a  lot!] to what could occur in the near future [a lot as well!]. Submitted to Statistics and Computing towards the special 25th anniversary issue, as announced in an earlier post.. Pulling strength and breadth from each other’s opinion, we have certainly attained more than the sum of our initial respective contributions, but we are welcoming comments about bits and pieces of importance that we miss and even more about promising new directions that are not posted in this survey. (A warning that is should go with most of my surveys is that my input in this paper will not differ by a large margin from ideas expressed here or in previous surveys.)

Filed under: Books, Statistics, University life Tagged: ABC, adaptive MCMC methods, Bayesian Analysis, Bayesian computation, Bayesian optimisation, expectation-propagation, MCMC algorithms, pseudo-marginal MCMC, Statistics and Computing, survey, University of Bristol, University of Warwick, variational Bayes methods
Categories: Bayesian Bloggers

relabelling mixtures (#2)

Xian's Og - Wed, 2015-02-04 18:15

Following the previous post, I went and had  a (long) look at Puolamäki and Kaski’s paper. I must acknowledge that, despite having several runs through the paper, I still have trouble with the approach… From what I understand, the authors use a Bernoulli mixture pseudo-model to reallocate the observations to components.  That is, given an MCMC output with simulated allocations variables (a.k.a., hidden or latent variables), they create a (TxK)xn matrix of component binary indicators e.g., for a three component mixture,

0 1 0 0 1 0…
1 0 0 0 0 0…
0 0 1 1 0 1…
0 1 0 0 1 1…

and estimate a probability to be in component j for each of the n observations, according to the (pseudo-)likelihood

It took me a few days, between morning runs and those wee hours when I cannot get back to sleep (!), to make some sense of this Bernoulli modelling. The allocation vectors are used together to estimate the probabilities of being “in” component j together. However the data—which is the outcome of an MCMC simulation and de facto does not originate from that Bernoulli mixture—does not seem appropriate, both because it is produced by an MCMC simulation and is made of blocks of highly correlated rows [which sum up to one]. The Bernoulli likelihood above also defines a new model, with many more parameters than in the original mixture model. And I fail to see why perfect, partial or inexistent label switching [in the MCMC sequence] is not going to impact the estimation of the Bernoulli mixture. And why an argument based on a fixed parameter value (Theorem 3) extends to an MCMC outcome where parameters themselves are subjected to some degree of label switching. Bemused, I remain…

Filed under: Statistics, Travel, University life Tagged: allocations, Bernoulli mixture, finite mixtures, label switching, MCMC algorithms, Monte Carlo Statistical Methods, permutations
Categories: Bayesian Bloggers

the latest Significance: Astrostats, black swans, and pregnant drivers [and zombies]

Xian's Og - Tue, 2015-02-03 18:15

Reading Significance is always an enjoyable moment, when I can find time to skim through the articles (before my wife gets hold of it!). This time, I lost my copy between my office and home, and borrowed it from Tom Nichols at Warwick with four mornings to read it during breakfast. This December issue is definitely interesting, as it contains several introduction articles on astro- and cosmo-statistics! One thing I had not noticed before is how a large fraction of the papers is written by authors of books, giving a quick entry or interview about their book. For instance, I found out that Roberto Trotta had written a general public book called the Edge of the Sky (All You Need to Know About the All-There-Is) which exposes the fundamentals of cosmology through the 1000 most common words in the English Language.. So Universe is replaced with All-There-Is! I can understand and to some extent applaud the intention, but it nonetheless makes for a painful read, judging from the excerpt, when researcher and telescope are not part of the accepted vocabulary. Reading the corresponding article in Significance let me a bit bemused at the reason provided for the existence of a multiverse, i.e., of multiple replicas of our universe, all with different conditions: multiplying the universes makes our more likely, while it sounds almost impossible on its own! This sounds like a very frequentist argument… and I am not even certain it would convince a frequentist. The other articles in this special astrostatistics section were of a more statistical nature, from estimating the number of galaxies to the chances of a big asteroid impact. Even though I found the graphical representation of the meteorite impacts in the past century because of the impact drawing in the background. However, when I checked the link to Carlo Zapponi’s website, I found the picture was a still of a neat animation of meteorites falling since the first report.

“Taleb himself, once described as a philosopher, now self-identifies as a statistician. And, intrinsically, anti-fragility and statistical thinking are interrelated.” T. Bendell

Two rather superfluous [in my opinion] articles dealt with a regression of zombie google entries associated with each U.S. state—written by Daniel Zelterman, in connection with his chapter in the book Mathematical Modelling of Zombies), where I discovered the unexpected name of Mark Girolami [as a writer, not as a zombie cyclist!]—and something about X’mas crackers I have read further than the title. Yet another entry related with a book was Tony Bendell’s discussion of his recent book on Building anti-fragile organisations, written in the wake of Taleb’s book. Antifragile. (Reviewed by Larry Wasserman on the now defunct Normal Deviate.)

And I have not mentioned pregnant drivers yet: one entry was by two Canadian epidemiologists who studied the accident rate of pregnant women and concluded at an increased risk during pregnancy. I did not read the original paper so cannot make an informed comment, but still wonder at the possible impact of a higher tendency for pregnant women to be sent to hospital in case of a minor car accident. There could also be other confounding factors, like an increased mileage during pregnancy (certainly when compared with immediately after). And, since the study covers only women who completed their pregnancy and were still alive one year later, it excludes those who had severe or fatal crashes before starting a pregnancy or during their pregnancy. Another possible caveat is that, due to the rather limited length of the study, there may be an impact of the years of observation on the observed rise. This data is taken from Ontario, where Winter may be rather fierce!, and corrections for both seasonality and general number of crashes should have been considered.

Filed under: Books, Kids, pictures, Statistics, Travel, University life Tagged: antifragile, astrostatistics, bolides, cosmology, multiverse, Nassim Taleb, pregnancy, Roberto Trotta, Significance, University of Warwick, zombies
Categories: Bayesian Bloggers

my week in a Tudor farmhouse

Xian's Og - Mon, 2015-02-02 18:15

As I could not book my “usual” maths house on the campus of the University of Warwick, I searched for another accommodation and discovered a nice shared house in the countryside (next to my standard running route), run by the Warwick Institute of Advanced Study, and called Cryfield Grange. As seen from the pictures, the building itself is impressive, even though there is not much left inside of its Tudor foundations, except some unexpected steps in the middle of some rooms and a few remaining black beams; it is also quite enjoyable for a week visit, with a large kitchen where I made rice pudding and pissaladière for the whole week, and a bike path to the University. I will definitely try to get there in the summer, as it must be even more enjoyable!

Filed under: pictures, Running, Travel, University life Tagged: cooking, Cryfield Grange, England, pissaladière, Tudor, University of Warwick, visiting accomodation
Categories: Bayesian Bloggers

minimaxity of a Bayes estimator

Xian's Og - Sun, 2015-02-01 18:15

Today, while in Warwick, I spotted on Cross Validated a question involving “minimax” in the title and hence could not help but look at it! The way I first understood the question (and immediately replied to it) was to check whether or not the standard Normal average—reduced to the single Normal observation by sufficiency considerations—is a minimax estimator of the normal mean under an interval zero-one loss defined by

where L is a positive tolerance bound. I had not seen this problem before, even though it sounds quite standard. In this setting, the identity estimator, i.e., the normal observation x, is indeed minimax as (a) it is a generalised Bayes estimator—Bayes estimators under this loss are given by the centre of an equal posterior interval—for this loss function under the constant prior and (b) it can be shown to be a limit of proper Bayes estimators and its Bayes risk is also the limit of the corresponding Bayes risks. (This is a most traditional way of establishing minimaxity for a generalised Bayes estimator.) However, this was not the question asked on the forum, as the book by Zacks it referred to stated that the standard Normal average maximised the minimal coverage, which amounts to the maximal risk under the above loss. With the strange inversion of parameter and estimator in the minimax risk:

which makes the first bound equal to 0 by equating estimator and mean μ. Note however that I cannot access the whole book and hence may miss some restriction or other subtlety that would explain for this unusual definition. (As an aside, note that Cross Validated has a protection against serial upvoting, So voting up or down at once a large chunk of my answers on that site does not impact my “reputation”!)

Filed under: Books, Kids, Statistics, University life Tagged: Bayes estimators, cross validated, generalised Bayes estimators, mathematical statistics, minimaxity, serial upvoting
Categories: Bayesian Bloggers

foxglove summer

Xian's Og - Sat, 2015-01-31 18:15

Here is the fifth instalment in the Peter Grant (or Rivers of London) series by Ben Aaronovitch. Thus entitled Foxglove summer, which meaning only became clear (to me) by the end of the book. I found it in my mailbox upon arrival in Warwick last Sunday. And rushed through the book during evenings, insomnia breaks and even a few breakfasts!

“It’s observable but not reliably observable. It can have a quantifiable effects, but resists any attempt to apply mathematical principles to it – no wonder Newton kept magic under wraps. It must have driven him mental. Or maybe not.” (p.297)

Either because the author has run out of ideas to centre a fifth novel on a part or aspect of London (even though the parks, including the London Zoo, were not particularly used in the previous novels), or because he could not set this new type of supernatural in a city (no spoilers!), this sequel takes place in the Western Counties, close to the Welsh border (and not so far from Brother Cadfael‘s Shrewbury!). It is also an opportunity to introduce brand new (local) characters which are enjoyable if a wee bit of a caricature! However, the inhabitants of the small village where the kidnapping investigation takes place are almost too sophisticated for Peter Grant who has to handle the enquiry all by himself, as his mentor is immobilised in London by the defection of Peter’s close colleague, Lindsey.

“We trooped off (…) down something that was not so much a path as a statistical variation in the density of the overgrowth.” (p.61)

As usual, the dialogues and monologues of Grant are the most enjoyable part of the story, along with a development of the long-in-the-coming love affair with the river goddess Beverley Brooks. And a much appreciated ambiguity in the attitude of Peter about the runaway Lindsey… The story itself reflects the limitations of a small village where one quickly repeats over and over the same trips and the same relations. Which gives a sensation of slow motion, even in the most exciting moments. The resolution of the enigma is borrowing too heavily to the fae and elves folklore, even though the final pages bring a few surprises. Nonetheless, the whole book was a page-turner for me, meaning I spent more time reading it this week than I intended or than was reasonable. No wonder for a series taking place in The Folly!

Filed under: Books, Kids, Travel Tagged: Ben Aaronnovitch, book review, England, foxglove, PC Peter Grant, Rivers of London, The Folly, unicorn, University of Warwick, Wales, Western Counties, Worcester
Categories: Bayesian Bloggers

icefalls on Ben Nevis

Xian's Og - Fri, 2015-01-30 18:15


The seminar invitation to Edinburgh gave me the opportunity and the excuse for a quick dash to Fort William for a day of ice-climbing on Ben Nevis. The ice conditions were perfect but there was alas too much snowdrift to attempt Point Five Gully, one of the mythical routes on the Ben. (Last time, the ice was not in good conditions.) Instead, we did three pitches on three different routes, one iced rock-face near the CIC hut, the first pitch of Waterfall Gully on Carn Dearg Buttress, and the first pitch of The Curtain, again on Carn Dearg Buttress.

The most difficult climb was the first one, grading about V.5 in Scottish grade, maybe above that as the ice was rather rotten, forcing my guide Ali to place many screws. And forcing me to unscrew them! Then the difficulty got much lower, except for the V.5 start of the Waterfall, where I had to climb with hands an ice pillar as the ice-picks would not get a good grip. Breaking another large pillar in the process, fortunately mostly avoiding being hit. The final climb was quite easy, more of a snow steep slope than a true ice-climb. Too bad the second part of the route was blocked by two fellows who could not move! Anyway, it was another of those rare days on the ice, with enough choice to worry about sharing with other teams, and a terrific guide! And a reasonable day for Scotland with little snow, no rain, plenty of wind and not that cold (except when belaying!).

Filed under: Mountains, pictures, Travel Tagged: Ben Nevis, Carn Dearg Buttress, Highlands, ice climbing, point five gully, Scotland, Scottish climbing grade, waterfall
Categories: Bayesian Bloggers

relabelling mixtures

Xian's Og - Thu, 2015-01-29 18:15

Another short paper about relabelling in mixtures was arXived last week by Pauli and Torelli. They refer rather extensively to a previous paper by Puolamäki and Kaski (2009) of which I was not aware, paper attempting to get an unswitching sampler that does not exhibit any label switching, a concept I find most curious as I see no rigorous way to state that a sampler is not switching! This would imply spotting low posterior probability regions that the chain would cross. But I should check the paper nonetheless.

Because the G component mixture posterior is invariant under the G! possible permutations, I am somewhat undeciced as to what the authors of the current paper mean by estimating the difference between two means, like μ1-μ2. Since they object to using the output of a perfectly mixing MCMC algorithm and seem to prefer the one associated with a non-switching chain. Or by estimating the probability that a given observation is from a given component, since this is exactly 1/G by the permutation invariance property. In order to identify a partition of the data, they introduce a loss function on the joint allocations of pairs of observations, loss function that sounds quite similar to the one we used in our 2000 JASA paper on the label switching deficiencies of MCMC algorithms. (And makes me wonder why this work of us is not deemed relevant for the approach advocated in the paper!) Still, having read this paper, which I find rather poorly written, I have no clear understanding of how the authors give a precise meaning to a specific component of the mixture distribution. Or how the relabelling has to be conducted to avoid switching. That is, how the authors define their parameter space. Or their loss function. Unless one falls back onto the ordering of the means or the weights which has the drawback of not connecting with the levels sets of a particular mode of the posterior distribution, meaning that imposing the constraints result in a region that contains bits of several modes.

At some point the authors assume the data can be partitioned into K≤G groups such that there is a representative observation within each group never sharing a component (across MCMC iterations) with any of the other representatives. While this notion is label invariant, I wonder whether (a) this is possible on any MCMC outcome; (b) it indicates a positive or negative feature of the MCMC sampler.; and (c) what prevents the representatives to switch in harmony from one component to the next while preserving their perfect mutual exclusion… This however constitutes the advance in the paper, namely that component dependent quantities as estimated as those associated with a particular representative. Note that the paper contains no illustration, hence that the method may prove hard to impossible to implement!

Filed under: Books, Statistics Tagged: arXiv, Bayesian estimation, finite mixtures, label switching, Matthew Stephens, pivot, University of Warwick
Categories: Bayesian Bloggers

Bayesian optimization for likelihood-free inference of simulator-based statistical models

Xian's Og - Wed, 2015-01-28 18:15

Michael Gutmann and Jukka Corander arXived this paper two weeks ago. I read part of it (mostly the extended introduction part) on the flight from Edinburgh to Birmingham this morning. I find the reflection it contains on the nature of the ABC approximation quite deep and thought-provoking.  Indeed, the major theme of the paper is to visualise ABC (which is admittedly shorter than “likelihood-free inference of simulator-based statistical models”!) as a regular computational method based on an approximation of the likelihood function at the observed value, yobs. This includes for example Simon Wood’s synthetic likelihood (who incidentally gave a talk on his method while I was in Oxford). As well as non-parametric versions. In both cases, the approximations are based on repeated simulations of pseudo-datasets for a given value of the parameter θ, either to produce an estimation of the mean and covariance of the sampling model as a function of θ or to construct genuine estimates of the likelihood function. As assumed by the authors, this calls for a small dimension θ. This approach actually allows for the inclusion of the synthetic approach as a lower bound on a non-parametric version.

In the case of Wood’s synthetic likelihood, two questions came to me:

  • the estimation of the mean and covariance functions is usually not smooth because new simulations are required for each new value of θ. I wonder how frequent is the case where we can always use the same basic random variates for all values of θ. Because it would then give a smooth version of the above. In the other cases, provided the dimension is manageable, a Gaussian process could be first fitted before using the approximation. Or any other form of regularization.
  • no mention is made [in the current paper] of the impact of the parametrization of the summary statistics. Once again, a Cox transform could be applied to each component of the summary for a better proximity of/to the normal distribution.

When reading about a non-parametric approximation to the likelihood (based on the summaries), the questions I scribbled on the paper were:

  • estimating a complete density when using this estimate at the single point yobs could possibly be superseded by a more efficient approach.
  • the authors study a kernel that is a function of the difference or distance between the summaries and which is maximal at zero. This is indeed rather frequent in the ABC literature, but does it impact the convergence properties of the kernel estimator?
  • the estimation of the tolerance, which happens to be a bandwidth in that case, does not appear to be processed in this paper, which could explain for very low probabilities of acceptance mentioned in the paper.
  • I am lost as to why lower bounds on likelihoods are relevant here. Unless this is intended for ABC maximum likelihood estimation.

Guttmann and Corander also comment on the first point, through the cost of producing a likelihood estimator. They therefore suggest to resort to regression and to avoid regions of low estimated likelihood. And rely on Bayesian optimisation. (Hopefully to be commented later.)

Filed under: Books, Statistics, University life Tagged: ABC, ABC validation, Bayesian optimisation, non-parametrics, synthetic likelihood
Categories: Bayesian Bloggers