Bayesian News Feeds
Filed under: Kids, Mountains, pictures, Running, Travel Tagged: hydroplane, lake, Maine, USA, waterlilies
As I was getting worried about the chances of survival of my current laptop (bought in emergency upon my return from Kyoto!), I decided to use some available grant money to buy a new laptop without stepping through the emergency square. Thanks to my local computer engineer, Thomas, I found a local dealer selling light laptops with an already installed Ubuntu 14.04… And qwerty (UK) keyboards. Even though the previous move to Kubuntu 12.04 had been seamless, a failed attempt to switch a Mac to Ubuntu a few months later left me wary about buying a computer first and testing later whether or not it was truly Linux compatible. I am therefore quite happy with the switch and grateful to Thomas for the suggestion. I managed to re-compile my current papers and to run my current R codes, plus connect by wireless and read photos from my camera, hence validating the basic operations I primarily require from a computer! And reinstalled KDE. (I am still having difficulties with the size of the fonts in Firefox though. Which do not seem coherent from a tab to the next.) Enough to sacrifice a new sticker to cover the brand on its cover….
Filed under: Linux, R, Statistics, University life Tagged: KDE, Kubuntu 12.04, MacBook Pro, qwerty, Ubuntu 14.04
Filed under: Kids, pictures, Travel Tagged: Boston, JSM 2014, Massachusset, skyscrapers, USA, vacations
On the last day of the IFCAM workshop in Bangalore, Marc Lavielle from INRIA presented a talk on mixed effects where he illustrated his original computer language Monolix. And mentioned that his CRC Press book on Mixed Effects Models for the Population Approach was out! (Appropriately listed as out on a 14th of July on amazon!) He actually demonstrated the abilities of Monolix live and on diabets data provided by an earlier speaker from Kolkata, which was a perfect way to start initiating a collaboration! Nice cover (which is all I saw from the book at this stage!) that maybe will induce candidates to write a review for CHANCE. Estimation of those mixed effect models relies on stochastic EM algorithms developed by Marc Lavielle and Éric Moulines in the 90’s, as well as MCMC methods.
Filed under: Books, pictures, R, Statistics, Travel, University life Tagged: Bangalore, book review, CHANCE, EM, IFCAM, Indian Institute of Science, INRIA, Kolkata, Marc Lavielle, MCMC, mixed effect models, Monolix, SAEM
Filed under: Kids, Mountains, pictures, Running, Travel Tagged: lake New England, Maine, sunset, USA
Just like after the Malaysian Airlines flight 370 disappearance, the current Ebola virus outbreak makes me feel we are sorely missing an emergency statistical force to react on urgent issues… It would indeed be quite valuable to have a team of statisticians at the ready to quantify risks and posterior probabilities and avoid media approximations. The situations calling for this reactive force abound. A few days ago I was reading about the unknown number of missing pro-West activists in Eastern Ukraine. Maybe statistical societies could join forces to set such an emergency team?! Whose goals are somewhat different from the great Statistics without Borders…
As a side remark, the above philogeny is taken from Dudas and Rambaut’s recent paper in PLOS reassessing the family tree of the current Ebola virus(es) acting in Guinea. The tree is found using MrBayes, which delivers a posterior probability of 1 to this filiation! And concluding “that the rooting of this clade using the very divergent other ebolavirus species is very problematic.”
Filed under: Statistics, Travel, University life Tagged: ASA, Ebola virus, JSM 2014, Malaysian Airlines, philogenic trees, Statistics without Borders, The New York Times, Ukraine
[Dennis Prangle sent me his comments on our ABC model choice by random forests paper. Here they are! And I appreciate very much contributors commenting on my paper or others, so please feel free to join.]
This paper proposes a new approach to likelihood-free model choice based on random forest classifiers. These are fit to simulated model/data pairs and then run on the observed data to produce a predicted model. A novel “posterior predictive error rate” is proposed to quantify the degree of uncertainty placed on this prediction. Another interesting use of this is to tune the threshold of the standard ABC rejection approach, which is outperformed by random forests.
The paper has lots of thought-provoking new ideas and was an enjoyable read, as well as giving me the encouragement I needed to read another chapter of the indispensable Elements of Statistical Learning However I’m not fully convinced by the approach yet for a few reasons which are below along with other comments.
The paper shows that random forests outperform rejection based ABC. I’d like to see a comparison to more efficient ABC model choice algorithms such as that of Toni et al 2009. Also I’d like to see if the output of random forests could be used as summary statistics within ABC rather than as a separate inference method.
Posterior predictive error rate (PPER)
This is proposed to quantify the performance of a classifier given a particular data set. The PPER is the proportion of times the classifier’s most favoured model is incorrect for simulated model/data pairs drawn from an approximation to the posterior predictive. The approximation is produced by a standard ABC analysis.
Misclassification could be due to (a) a poor classifier or (b) uninformative data, so the PPER aggregrates these two sources of uncertainty. I think it is still very desirable to have an estimate of the uncertainty due to (b) only i.e. a posterior weight estimate. However the PPER is useful. Firstly end users may sometimes only care about the aggregated uncertainty. Secondly relative PPER values for a fixed dataset are a useful measure of uncertainty due to (a), for example in tuning the ABC threshold. Finally, one drawback of the PPER is the dependence on an ABC estimate of the posterior: how robust are the results to the details of how this is obtained?
This paper illustrates an important link between ABC and machine learning classification methods: model choice can be viewed as a classification problem. There are some other links: some classifiers make good model choice summary statistics (Prangle et al 2014) or good estimates of ABC-MCMC acceptance ratios for parameter inference problems (Pham et al 2014). So the good performance random forests makes them seem a generally useful tool for ABC (indeed they are used in the Pham et al al paper).
Filed under: pictures, R, Statistics, University life Tagged: ABC, ABC model choice, arXiv, classification, Dennis Prangle, Elements of Statistical Learning, machine learning, model posterior probabilities, posterior predictive, PPER, random forests
As I was waiting for my plane to Bangalore a week ago, I spotted a cheap English edition of Bram Stoker’s Dracula in De Gaulle airport. I had not re-read the book since my teenage years (quite a while ago, even by wampyr’s standards!), so I bought it for the trip ahead. I remembered very little of the style of the [French translation of the] book if the story itself was still rather fresh on my mind (as were the uneasy nights after reading the novel!).
“I can hazard no opinion. I do not know what to think and I have no data on which to found a conjecture.”
Dracula is definitely a Victorian gothic novel in the same spirit as Radcliffe’s Mysteries of Udolpho I read last year, if of a late and lighter style… Characters do not feel very realistic (!), maybe because the novel is written in the epistolary style, which makes those characters only express noble or proper sentiments and praise virtues in their companions. (The book could obviously be re-read with this filter, attempting at guessing the true feelings of those poor characters forced into a mental straitjacket by the Victorian moral codes.) However, even without this deconstructive approach, the book is quite fascinating as a representation of the codes of the time. More than for a rather unconvincing plot which leaves the main protagonist mostly in the dark [of a coffin, obviously!]. The small band of wampyr-hunters pursuing Dracula seems bound to commit every mistake in the book and miss clues about his local victims and opportunities to end up Dracula’s taste of England earlier… And the progress of Dracula in his invasion is too slow to be frightening. Anyway, what I found highly interesting in Dracula is the position and treatment of women in this novel, from innocent vaporous victims to wanton seductresses once un-dead, from saintly and devoted wives to unusually bright women “more clever than men” but still prone to hysteria… Once again, many filters of (modern) societal and sociological constraints could be lifted from this presentation. I also noticed that no legal authority ever appears in the novel: the few policemen therein lift rescued children from cemeteries or nod at the heroes breaking into Dracula’s house in London. This absence may point out issues with Victorian society that may prove impossible to solve with out radical changes. (Or I may be reading too much!)
Filed under: Books, Kids, pictures Tagged: boko review, Bram Stoker, Dracula, gothic novels, Transylvania, Victorian society
Filed under: pictures, Running, Travel, University life, Wines Tagged: Boston, City Landing, JSM 2014, Massachusset, Mike's cannoli, musée Maillol
Last and final day and post at and about JSM 2014! It is very rare that I stay till the last day and it is solely due to family constraints that I attended the very last sessions. It was a bit eerie, walking through the huge structure of the Boston Convention Centre that could easily house several A380 and meeting a few souls dragging a suitcase to the mostly empty rooms… Getting scheduled on the final day of the conference is not the nicest thing and I offer my condolences to all speakers ending up speaking today! Including my former Master student Anne Sabourin.
I first attended the Frontiers of Computer Experiments: Big Data, Calibration, and Validation session with a talk by David Hingdon on the extrapolation limits of computer model, talk that linked very nicely with Stephen Stigler’s Presidential Address and stressed the need for incorporating the often neglected fact that models are not reality. Jared Niemi also presented an approximative way of dealing with large dataset Gaussian process modelling. It was only natural to link this talk with David’s and wonder about the extrapola-bility of the modelling and the risk of over-fitting and the potential for detecting sudden drops in the function.
The major reason why I made the one-hour trip back to the Boston Convention Centre was however theonder about the extrapola-bility of the modelling and the risk of over-fitting and the potential for detecting sudden drops in the function.
The major reason why I made the one-hour trip back to the Boston Convention Centre was however the Human Rights Violations: How Do We Begin Counting the Dead? session. It was both of direct interest to me as I had wondered in the past days about statistically assessing the number of political kidnappings and murders in Eastern Ukraine. And of methodological relevance, as the techniques were connected with capture-recapture and random forests. And of close connections with two speakers who alas could not make it and were replaced by co-authors. The first talk by Samuel Ventura considered ways of accelerating the comparison of entries into multiple lists for identifying unique individuals, with the open methodological question of handling populations of probabilities. As the outcome of random forests. My virtual question related to this talk was why the causes for duplications and errors in the record were completely ignored. At least in the example of the Syrian death, some analysis could be conducted on the reasons for differences in the entries. And maybe a prior model constructed. The second talk by Daniel Manrique-Vallier was about using non-parametric capture-recapture to count the number of dead from several lists. Once again bypassing the use of potential covariates for explaining the differences. As I noticed a while ago when analysing the population of (police) captured drug addicts in the Greater Paris, the prior modelling has a strong impact on the estimated population. Another point I would have liked to discuss was the repeated argument that Arabic (script?) made the identification of individuals more difficult: my naïve reaction was to wonder whether or not this was due to the absence of fluent Arabic speakers in the team. Who could have further helped to build a model on the potential alternative spellings and derivations of Arabic names. But I maybe missed more subtle difficulties.
Filed under: Books, Statistics, Travel, University life Tagged: Boston, capture-recapture, Gaussian processes, JSM 2014, Massachusset, record linkage, Syria, Ukraine