Bayesian News Feeds
I spent [most of] the past week in Oxford in connection with our joint OxWaSP PhD program, which is supported by the EPSRC, and constitutes a joint Centre of Doctoral Training in statistical science focussing on data-intensive environments and large-scale models. The first cohort of a dozen PhD students had started their training last Fall with the first year spent in Oxford, before splitting between Oxford and Warwick to write their thesis. Courses are taught over a two week block, with a two day introduction to the theme (Bayesian Statistics in my case), followed by reading, meetings, daily research talks, mini-projects, and a final day in Warwick including presentations of the mini-projects and a concluding seminar. (involving Jonty Rougier and Robin Ryder, next Friday). This approach by bursts of training periods is quite ambitious in that it requires a lot from the students, both through the lectures and in personal investment, and reminds me somewhat of a similar approach at École Polytechnique where courses are given over fairly short periods. But it is also profitable for highly motivated and selected students in that total immersion into one topic and a large amount of collective work bring them up to speed with a reasonable basis and the option to write their thesis on that topic. Hopefully, I will see some of those students next year in Warwick working on some Bayesian analysis problem!
On a personal basis, I also enjoyed very much my time in Oxford, first for meeting with old friends, albeit too briefly, and second for cycling, as the owner of the great Airbnb place I rented kindly let me use her bike to go around, which allowed me to go around quite freely! Even on a train trip to Reading. As it was a road racing bike, it took me a trip or two to get used to it, especially on the first day when the roads were somewhat icy, but I enjoyed the lightness of it, relative to my lost mountain bike, to the point of considering switching to a road bike for my next bike… I had also some apprehensions with driving at night, which I avoid while in Paris, but got over them until the very last night when I had a very close brush with a car entering from a side road, which either had not seen me or thought I would let it pass. Gave me the opportunity of shouting Oï!
Filed under: Books, Kids, pictures, Statistics, Travel, University life Tagged: airbnb, Bayesian statistics, EPSRC, mountain bike, PhD course, PhD students, slides, slideshare, stolen bike, The Bayesian Choice, University of Oxford, University of Warwick
Here are two examples of animal “face” tee-shirts I saw advertised in The New York Times and that I would not consider wearing. At any time.
Filed under: Kids, pictures Tagged: animals, Asian lady beetle, fashion, tarsier, tee-shirt, The New York Times
Yesterday, I was all too briefly in Edinburgh for a few hours, to give a seminar in the School of Mathematics, on the random forests approach to ABC model choice (that was earlier rejected). (The slides are almost surely identical to those used at the NIPS workshop.) One interesting question at the end of the talk was on the potential bias in the posterior predictive expected loss, bias against some model from the collection of models being evaluated for selection. In the sense that the array of summaries used by the random forest could fail to capture features of a particular model and hence discriminate against it. While this is correct, there is no fundamental difference with implementing a posterior probability based on the same summaries. And the posterior predictive expected loss offers the advantage of testing, that is, for representative simulations from each model, of returning the corresponding model prediction error to highlight poor performances on some models. A further discussion over tea led me to ponder whether or not we could expand the use of random forests to Bayesian quantile regression. However, this would imply a monotonicity structure on a collection of random forests, which sounds daunting…
My stay in Edinburgh was quite brief as I drove to the Highlands after the seminar, heading to Fort William, Although the weather was rather ghastly, the traffic was fairly light and I managed to get there unscathed, without hitting any of the deer of Rannoch Mor (saw one dead by the side of the road though…) or the snow banks of the narrow roads along Loch Lubnaig. And, as usual, it still was a pleasant feeling to drive through those places associated with climbs and hikes, Crianlarich, Tyndrum, Bridge of Orchy, and Glencoe. And to get in town early enough to enjoy a quick dinner at The Grog & Gruel, reflecting I must have had half a dozen dinners there with friends (or not) over the years. And drinking a great heather ale to them!
Filed under: Mountains, pictures, Statistics, Travel, University life, Wines Tagged: ABC, ABC model choice, Edinburgh, Fort William, quantile regression, random forests, Scotland, The Grog & Gruel, University of Edinburgh
I Remember You: A Ghost Story is another Icelandic novel by Yrsa Sigurdardottir, that I bought more because it takes place in Iceland than because of its style, as I found the previous novel was somewhat missing in its plot. Still, I was expecting better, as the novel won the 2012 Icelandic Crime Fiction Award. Alas, I should have been paying more attention to the subtitle “A ghost story”, since this is indeed a ghost story of a most traditional nature (I mean, without the deep humour of Rivers of London!), where the plot itself is incomprehensible (or inexistent) without taking into account the influence and even actions of ghosts! I know I should have been warned by the earlier volume since there as well some characters were under the influence, but I had thought it was more of a psychological disorder than a genuine part of the story! As I do not enjoy in the least ghost stories of that kind, having grown out of the scary parts, it was a ghastly drag to finish this book, especially because the plot is very shroud-thin and (spoilers, spoilers!) the very trip and subsequent behaviour of the three characters in the deserted village is completely irrational (even prior to their visitation by a revengeful ghost!). The motives for all characters that end up in the haunted place are similarly flimsy… The connections between the characters are fairly shallow and the obvious affair between two of them takes hundreds of pages to be revealed. The very last pages of the book see the rise of a new ghost, maybe in prevision of a new novel. No matter what, this certainly is my last book by Sigurdardottir and I will rather wait for the next Indriðason to increase my collection of Icelandic Noir…! Keeping away from the fringe that caters to the supposedly widespread Icelandic belief in ghosts and trolls!!!
Filed under: Books, Travel Tagged: Arnaldur Indriðason, ghosts, horror, Iceland noir, Rivers of London, Yrsa Sigurðardóttir
Our random forest paper was alas rejected last week. Alas because I think the approach is a significant advance in ABC methodology when implemented for model choice, avoiding the delicate selection of summary statistics and the report of shaky posterior probability approximation. Alas also because the referees somewhat missed the point, apparently perceiving random forests as a way to project a large collection of summary statistics on a limited dimensional vector as in the Read Paper of Paul Fearnhead and Dennis Prarngle, while the central point in using random forests is the avoidance of a selection or projection of summary statistics. They also dismissed ou approach based on the argument that the reduction in error rate brought by random forests over LDA or standard (k-nn) ABC is “marginal”, which indicates a degree of misunderstanding of what the classification error stand for in machine learning: the maximum possible gain in supervised learning with a large number of classes cannot be brought arbitrarily close to zero. Last but not least, the referees did not appreciate why we mostly cannot trust posterior probabilities produced by ABC model choice and hence why the posterior error loss is a valuable and almost inevitable machine learning alternative, dismissing the posterior expected loss as being not Bayesian enough (or at all), for “averaging over hypothetical datasets” (which is a replicate of Jeffreys‘ famous criticism of p-values)! Certainly a first time for me to be rejected based on this argument!
Filed under: Books, Statistics, University life Tagged: ABC, ABC model choice, Bayesian Analysis, classification, Harold Jeffreys, random forests, Read paper, summary statistics
Thomas Schön, Uppsala University
Filed under: pictures, R, Statistics, Travel, University life, Wines Tagged: ENSAE, Monte Carlo Statistical Methods, Paris, sequential Monte Carlo, SMC 2015, workshop
This blog post was contributed by my friend Julien Cornebise, as a reprint of a column he wrote for the latest ISBA Bulletin.
This article is an occasion to pay forward ever so slightly, by encouraging current Ph.D. candidates on their path, the support ISBA gave me. Four years ago, I was honored and humbled to receive the ISBA 2010 Savage Award, category Theory and Methods, for my Ph.D. dissertation defended in 2009. Looking back, I can now testify how much this brought to me both inside and outside of Academia.
Inside Academia: confirming and mitigating the widely-shared post-graduate’s impostor syndrome
Upon hearing of the great news, a brilliant multi-awarded senior researcher in my lab very kindly wrote to me that such awards meant never having to prove one’s worth again. Although genuinely touched by her congratulations, being far less accomplished and more junior than her, I felt all the more responsible to prove myself worth of this show of confidence from ISBA. It would be rather awkward to receive such an award only to fail miserably shortly after.
This resonated deeply with the shared secret of recent PhDs, discovered during my year at SAMSI, a vibrant institution where half a dozen new postdocs arrive each year: each and every one of us, fresh Ph.D.s from some of the best institutions (Cambridge, Duke, Waterloo, Paris…) secretly suffered the very same impostor syndrome. We were looking at each other’s CV/website and thinking “jeez! this guy/girl across the door is an expert of his/her field, look at all he/she has done, whereas I just barely scrape by on my own research!” – all the while putting up a convincing façade of self-assurance in front of audiences and whiteboards, to the point of apparent cockiness. Only after candid exchanges in SAMSI’s very open environment did we all discover being in the very same mindset.
In hindsight the explanation is simple: each young researcher in his/her own domain has the very expertise to measure how much he/she still does not know and has yet to learn, while he/she hears other young researchers, experts in their own other field, present results not as familiar to him/her, thus sounding so much more advanced. This take-away from SAMSI was perfectly confirmed by the Savage Award: yes, maybe indeed, I, just like my other colleagues, might actually know something relatively valuable, and my scraping by might just be not so bad – as is also the case of so many of my young colleagues.
Of course, impostor syndrome is a clingy beast and, healthily, I hope to never get entirely over it – merely overcoming it enough to say “Do not worry, thee young candidate, thy doubts pave a path well trodden”.
A similar message is also part of the little-known yet gem of a guide “How to do Research at MIT AI Lab – Emotional Factors”, relevant far beyond its original lab. I recommend it to any Ph.D. student; the feedback from readers is unanimous.
Outside Academia: incredibly increased readability
After two post-docs, and curious to see what was out there in atypical paths, I took a turn out of purely academic research, first as an independent consultant, then recruited out of the blue by a start-up’s recruiter, and eventually doing my small share to help convince investors. I discovered there another facet of ISBA’s Savage Award: tremendous readability.
In Academia, the dominating metric of quality is the length of the publication list – a debate for another day. Outside of Academia, however, not all interlocutors know how remarkable is a JRSSB Read Paper, or an oral presentation at NIPS, or a publication in Nature.
This is where international learned societies, like ISBA, come into play: the awards they bestow can serve as headline-grabbing material in a biography, easily spotted. The interlocutors do not need to be familiar with the subtleties of Bayesian Analysis. All they see is a stamp of approval from an official association of this researcher’s peers. That, in itself, is enough of a quality metric to pass the first round of contact, raise interest, and get the chance to further the conversation.
First concrete example: the recruiter who contacted me for the start-up I joined in 2011 was tasked to find profiles for an Applied position. The Savage Award on the CV grabbed his attention, even though he had no inkling what Adaptive Sequential Monte Carlo Methods were, nor if they were immediately relevant to the start-up. Passing it to the start-up’s managers, they immediately changed focus and interviewed me for their Research track instead: a profile that was not what they were looking for originally, yet stood out enough to interest them for a position they had not thought of filling via a recruiter – and indeed a unique position that I would never have thought to find this way either!
Second concrete example, years later, hard at work in this start-up’s amazing team: investors were coming for a round of technical due diligence. Venture capitals sent their best scientists-in-residence to dive deeply into the technical details of our research. Of course what matters in the end is, and forever will be, the work that is done and presented. Yet, the Savage Award was mentioned in the first line of the biography that was sent ahead of time, as a salient point to give a strong first impression of our research team.
Advices to Ph.D. Candidates: apply, you are the world best expert on your topic
That may sound trivial, but the first advice: apply. Discuss with your advisor the possibility to put your dissertation up for consideration. This might sound obvious to North-American students, whose educative system is rife with awards for high-performing students. Not so much in France, where those would be at odds with the sometimes over-present culture of égalité in the younger-age public education system. As a cultural consequence, few French Ph.D. students, even the most brilliant, would consider putting up their dissertation for consideration. I have been very lucky in that regard to benefit from the advice of a long-term Bayesian, who offered to send it for me – thanks again Xi’an! Not all students, regardless how brilliant their work, are made aware of this possibility.
The second advice, closely linked: do not underestimate the quality of your work. You are the foremost expert in the entire world on your Ph.D. topic. As discussed above, it is all too easy to see how advanced are the maths wielded by your office-mate, yet oversee the as-much-advanced maths you are juggling on a day-to-day basis, more familiar to you, and whose limitations you know better than anyone else. Actually, knowing these very limitations is what proves you are an expert.
A word of thanks and final advice
Finally, a word of thanks. I have been incredibly lucky, throughout my career so far, to meet great people. My dissertation already had four pages of acknowledgements: I doubt the Bulletin’s editor would appreciate me renewing (and extending!) them here. They are just as heartfelt today as they were then. I must, of course, add ISBA and the Savage Award committee for their support, as well as all those who, by their generous donations, allow the Savage Fund to stay alive throughout the years.
Of interest to Ph.D. candidates, though, one special mention of a dual tutelage system, that I have seen successfully at work many times. The most senior, a professor with the deep knowledge necessary to steer the project brings his endless fonts of knowledge collected over decades, wrapped in hardened tough-love. The youngest, a postdoc or fresh assistant professor, brings virtuosity, emulation and day-to-day patience. In my case they were Pr. Éric Moulines and Dr. Jimmy Olsson. That might be the final advice to a student: if you ever stumble, as many do, as I most surely did, because Ph.D. studies can be a hell of a roller-coaster to go through, reach out to the people around you and the joint set of skills they want to offer you. In combination, they can be amazing, and help you open doors that, in retrospect, can be worth all the efforts.
Julien Cornebise, Ph.D.
Filed under: Kids, Statistics, University life Tagged: adaptive Monte Carlo, Bayesian Analysis, ISBA, ISBA Bulletin, Julien Cornebise, Savage award
The next Nordic-Baltic Biometric conference will take place in Reykjavik, next June, a few days after the O-Bayes 15 meeting in València. I will attend the conference as the organisers were kind enough to invite me to give a talk, with high hopes to take a few days off to go hiking day and night! The registration is now open, as is the call for abstracts.
Filed under: Mountains, pictures, Statistics, Travel, University life Tagged: biometry, conference, hiking, Iceland, invited talk, Nordic-Baltic Biometric conference, O-Bayes 2015, Reykjavik
Today, I took a look at a recently arXived paper posted in physics, lifting – A non reversible MCMC algorithm by Marija Vucleja, but I simply could not understand the concept of lifting. Presumably because of the physics perspective. And also because the paper is mostly a review, referring to the author’s earlier work. The notion of lifting is to create a duplicate of a regular Markov chain with given stationary distribution towards cancelling reversibility and hence speeding up the exploration of the state space. The central innovation in the paper seems to be in imposing a lifted reversibility, which means using the reverse dynamics on the lifted version of the chain, that is, the dual proposal
However, the paper does not explicit how the resulting Markov transition matrix on the augmented space is derived from the original matrix. I now realise my description is most likely giving the impression of two coupled Markov chains, which is not the case: the new setting is made of a duplicated sample space, in the sense of Nummelin split chain (but without the specific meaning for the binary variable found in Nummelin!). In the case of the 1-d Ising model, the implementation of the method means for instance picking a site at random, proposing to change its spin value by a Metropolis acceptance step and then, if the proposal is rejected, possibly switching to the corresponding value in the dual part of the state. Given the elementary proposal in the first place, I fail to see where the improvement can occur… I’d be most interested in seeing a version of this lifting in a realistic statistical setting.
Filed under: Books, Statistics, University life Tagged: arXiv, MCMC algorithms, reversible Markov chain
Last week, Likelihood-free Bayesian inference on the minimum clinically important difference was arXived by Nick Syring and Ryan Martin and I read it over the weekend, slowly coming to the realisation that their [meaning of] “likelihood free” was not my [meaning of] “likelihood free”, namely that it has nothing to do with ABC! The idea therein is to create a likelihood out of a loss function, in the spirit of Bassiri, Holmes and Walker, the loss being inspired here by a clinical trial concept, the minimum clinically important difference, defined as
which defines a loss function per se when considering the empirical version. In clinical trials, Y is a binary outcome and X a vector of explanatory variables. This model-free concept avoids setting a joint distribution on the pair (X,Y), since creating a distribution on a large vector of covariates is always an issue. As a marginalia, the authors actually mention our MCMC book in connection with a logistic regression (Example 7.11) and for a while I thought we had mentioned MCID therein, realising later it was a standard description of MCMC for logistic models.
The central and interesting part of the paper is obviously defining the likelihood-free posterior as
The authors manage to obtain the rate necessary for the estimation to be asymptotically consistent, which seems [to me] to mean that a better representation of the likelihood-free posterior should be
(even though this rescaling does not appear verbatim in the paper). This is quite an interesting application of the concept developed by Bissiri, Holmes and Walker, even though it also illustrates the difficulty of defining a specific prior, given that the minimised target above can be transformed by an arbitrary increasing function. And the mathematical difficulty in finding a rate.
Filed under: Uncategorized Tagged: Amnesty International, blogging, freedom of speech, Je suis Charlie, Raif Badawi, religions, Saudi Arabia
Now my grading is over, I can reflect on the unexpected difficulties in the mathematical statistics exam. I knew that the first question in the multiple choice exercise, borrowed from Cross Validation, was going to be quasi-impossible and indeed only one student out of 118 managed to find the right solution. More surprisingly, most students did not manage to solve the (absence of) MLE when observing that n unobserved exponential Exp(λ) were larger than a fixed bound δ. I was also amazed that they did poorly on a N(0,σ²) setup, failing to see that
and determine an unbiased estimator that can be improved by Rao-Blackwellisation. No student reached the conditioning part. And a rather frequent mistake more understandable due to the limited exposure they had to Bayesian statistics: many confused parameter λ with observation x in the prior, writing
hence could not derive a proper posterior.
Filed under: Kids, pictures, Statistics, University life Tagged: Bayesian statistics, copies, final exam, grading, mathematical statistics, MLE, Université Paris Dauphine
Le Monde illustrated an article about discriminations against women with this graph which gives the number of men for 100 women per continent. This is a fairly poor graph, fit for one of Tufte’s counterexamples, as the bars are truncated at 85, make little sense as they do not convey the time dimension, are dwarfed by the legend on the left that is not of the same colors, and also miss the population dimension, which makes the title inappropriate since the graph does not show why there are more men than women on the planet, even if the large percentage of the population of Asia in the World’s population hints at the result.
Filed under: Books, pictures, Statistics Tagged: awful graphs, barplot, Le Monde, World's population
As mentioned in my recent review of Redshirts, I was planning to read John Scalzi’s most recent novel, Lock In, if only to check whether or not Redshirts was an isolated accident! This was the third book from “the pile” that I read through the Yule break and, indeed, it was a worthwhile attempt as the book stands miles above Redshirts…
The story is set in a very convincing near-future America where a significant part of the population is locked by a super-flu into a full paralysis that forces them to rely on robot-like interfaces to interact with unlocked humans. While the book is not all that specific on how the robotic control operates, except from using an inserted “artificial neural network” inside the “locked-in” brains, Scalzi manages to make it sound quite realistic, with societal and corporation issues at the forefront. To the point of selling really well the (usually lame) notion of instantaneous relocation at the other end of the US. And with the bare minimum of changes to the current society, which makes it easier to buy. I have not been that enthralled by a science-fiction universe for quite a while. I also enjoyed how the economics of this development of a new class of citizens was rendered, the book rotating around the consequences of the ending of heavy governmental intervention in lock in research.
Now, the story itself is of a more classical nature in that the danger threatening the loked-in population is uncovered single-handedly by the rookie detective who conveniently happens to be the son of a very influential ex-basketball-player and hence to meet all the characters involved in the plot. This is pleasant but somewhat thin with a limited number of players considering the issues at stake and a rather artificial ending.
Look here for a more profound review by Cory Doctorow.
Filed under: Books, Kids, Travel Tagged: Cory Doctorow, flu, John Scalzi, Lock In, redshirts, robots, science fiction
“One of Jeffreys’ goals was to create default Bayes factors by using prior distributions that obeyed a series of general desiderata.”
The paper Harold Jeffreys’s default Bayes factor hypothesis tests: explanation, extension, and application in Psychology by Alexander Ly, Josine Verhagen, and Eric-Jan Wagenmakers is both a survey and a reinterpretation cum explanation of Harold Jeffreys‘ views on testing. At about the same time, I received a copy from Alexander and a copy from the journal it had been submitted to! This work starts with a short historical entry on Jeffreys’ work and career, which includes four of his principles, quoted verbatim from the paper:
- “scientific progress depends primarily on induction”;
- “in order to formalize induction one requires a logic of partial belief” [enters the Bayesian paradigm];
- “scientific hypotheses can be assigned prior plausibility in accordance with their complexity” [a.k.a., Occam’s razor];
- “classical “Fisherian” p-values are inadequate for the purpose of hypothesis testing”.
“The choice of π(σ) therefore irrelevant for the Bayes factor as long as we use the same weighting function in both models”
A very relevant point made by the authors is that Jeffreys only considered embedded or nested hypotheses, a fact that allows for having common parameters between models and hence some form of reference prior. Even though (a) I dislike the notion of “common” parameters and (b) I do not think it is entirely legit (I was going to write proper!) from a mathematical viewpoint to use the same (improper) prior on both sides, as discussed in our Statistical Science paper. And in our most recent alternative proposal. The most delicate issue however is to derive a reference prior on the parameter of interest, which is fixed under the null and unknown under the alternative. Hence preventing the use of improper priors. Jeffreys tried to calibrate the corresponding prior by imposing asymptotic consistency under the alternative. And exact indeterminacy under “completely uninformative” data. Unfortunately, this is not a well-defined notion. In the normal example, the authors recall and follow the proposal of Jeffreys to use an improper prior π(σ)∝1/σ on the nuisance parameter and argue in his defence the quote above. I find this argument quite weak because suddenly the prior on σ becomes a weighting function... A notion foreign to the Bayesian cosmology. If we use an improper prior for π(σ), the marginal likelihood on the data is no longer a probability density and I do not buy the argument that one should use the same measure with the same constant both on σ alone [for the nested hypothesis] and on the σ part of (μ,σ) [for the nesting hypothesis]. We are considering two spaces with different dimensions and hence orthogonal measures. This quote thus sounds more like wishful thinking than like a justification. Similarly, the assumption of independence between δ=μ/σ and σ does not make sense for σ-finite measures. Note that the authors later point out that (a) the posterior on σ varies between models despite using the same data [which shows that the parameter σ is far from common to both models!] and (b) the [testing] Cauchy prior on δ is only useful for the testing part and should be replaced with another [estimation] prior when the model has been selected. Which may end up as a backfiring argument about this default choice.
“Each updated weighting function should be interpreted as a posterior in estimating σ within their own context, the model.”
The re-derivation of Jeffreys’ conclusion that a Cauchy prior should be used on δ=μ/σ makes it clear that this choice only proceeds from an imperative of fat tails in the prior, without solving the calibration of the Cauchy scale. (Given the now-available modern computing tools, it would be nice to see the impact of this scale γ on the numerical value of the Bayes factor.) And maybe it also proceeds from a “hidden agenda” to achieve a Bayes factor that solely depends on the t statistic. Although this does not sound like a compelling reason to me, since the t statistic is not sufficient in this setting.
In a differently interesting way, the authors mention the Savage-Dickey ratio (p.16) as a way to represent the Bayes factor for nested models, without necessarily perceiving the mathematical difficulty with this ratio that we pointed out a few years ago. For instance, in the psychology example processed in the paper, the test is between δ=0 and δ≥0; however, if I set π(δ=0)=0 under the alternative prior, which should not matter [from a measure-theoretic perspective where the density is uniquely defined almost everywhere], the Savage-Dickey representation of the Bayes factor returns zero, instead of 9.18!
“In general, the fact that different priors result in different Bayes factors should not come as a surprise.”
The second example detailed in the paper is the test for a zero Gaussian correlation. This is a sort of “ideal case” in that the parameter of interest is between -1 and 1, hence makes the choice of a uniform U(-1,1) easy or easier to argue. Furthermore, the setting is also “ideal” in that the Bayes factor simplifies down into a marginal over the sample correlation only, under the usual Jeffreys priors on means and variances. So we have a second case where the frequentist statistic behind the frequentist test[ing procedure] is also the single (and insufficient) part of the data used in the Bayesian test[ing procedure]. Once again, we are in a setting where Bayesian and frequentist answers are in one-to-one correspondence (at least for a fixed sample size). And where the Bayes factor allows for a closed form through hypergeometric functions. Even in the one-sided case. (This is a result obtained by the authors, not by Jeffreys who, as the proper physicist he was, obtained approximations that are remarkably accurate!)
“The fact that the Bayes factor is independent of the intention with which the data have been collected is of considerable practical importance.”
The authors have a side argument in this section in favour of the Bayes factor against the p-value, namely that the “Bayes factor does not depend on the sampling plan” (p.29), but I find this fairly weak (or tongue in cheek) as the Bayes factor does depend on the sampling distribution imposed on top of the data. It appears that the argument is mostly used to defend sequential testing.
“The Bayes factor (…) balances the tension between parsimony and goodness of fit, (…) against overfitting the data.”
In fine, I liked very much this re-reading of Jeffreys’ approach to testing, maybe the more because I now think we should get away from it! I am not certain it will help in convincing psychologists to adopt Bayes factors for assessing their experiments as it may instead frighten them away. And it does not bring an answer to the vexing issue of the relevance of point null hypotheses. But it constitutes a lucid and innovative of the major advance represented by Jeffreys’ formalisation of Bayesian testing.
Filed under: Books, Statistics, University life Tagged: Bayesian hypothesis testing, Dickey-Savage ratio, Harold Jeffreys, overfitting, Statistical Science, testing, Theory of Probability
I found another paper on the Jeffreys-Lindley paradox. Entitled “A Misleading Intuition and the Bayesian Blind Spot: Revisiting the Jeffreys-Lindley’s Paradox”. Written by Guillaume Rochefort-Maranda, from Université Laval, Québec.
This paper starts by assuming an unbiased estimator of the parameter of interest θ and under test for the null θ=θ0. (Which makes we wonder at the reason for imposing unbiasedness.) Another highly innovative (or puzzling) aspect is that the Lindley-Jeffreys paradox presented therein is described without any Bayesian input. The paper stands “within a frequentist (classical) framework”: it actually starts with a confidence-interval-on-θ-vs.-test argument to argue that, with a fixed coverage interval that excludes the null value θ0, the estimate of θ may converge to θ0 without ever accepting the null θ=θ0. That is, without the confidence interval ever containing θ0. (Although this is an event whose probability converges to zero.) Bayesian aspects come later in the paper, even though the application to a point null versus a point null test is of little interest since a Bayes factor is then a likelihood ratio.
As I explained several times, including in my Philosophy of Science paper, I see the Lindley-Jeffreys paradox as being primarily a Bayesiano-Bayesian issue. So just the opposite of the perspective taken by the paper. That frequentist solutions differ does not strike me as paradoxical. Now, the construction of a sequence of samples such that all partial samples in the sequence exclude the null θ=θ0 is not a likely event, so I do not see this as a paradox even or especially when putting on my frequentist glasses: if the null θ=θ0 is true, this cannot happen in a consistent manner, even though a single occurrence of a p-value less than .05 is highly likely within such a sequence.
Unsurprisingly, the paper relates to the three most recent papers published by Philosophy of Science, discussing first and foremost Spanos‘ view. When the current author introduces Mayo and Spanos’ severity, i.e. the probability to exceed the observed test statistic under the alternative, he does not define this test statistic d(X), which makes the whole notion incomprehensible to a reader not already familiar with it. (And even for one familiar with it…)
“Hence, the solution I propose (…) avoids one of [Freeman’s] major disadvantages. I suggest that we should decrease the size of tests to the extent where it makes practically no difference to the power of the test in order to improve the likelihood ratio of a significant result.” (p.11)
One interesting if again unsurprising point in the paper is that one reason for the paradox stands in keeping the significance level constant as the sample size increases. While it is possible to decrease the significance level and to increase the power simultaneously. However, the solution proposed above does not sound rigorous hence I fail to understand how low the significance has to be for the method to stop/work. I cannot fathom a corresponding algorithmic derivation of the author’s proposal.
“I argue against the intuitive idea that a significant result given by a very powerful test is less convincing than a significant result given by a less powerful test.”
The criticism on the “blind spot” of the Bayesian approach is supported by an example where the data is issued from a distribution other than either of the two tested distributions. It seems reasonable that the Bayesian answer fails to provide a proper answer in this case. Even though it illustrates the difficulty with the long-term impact of the prior(s) in the Bayes factor and (in my opinion) the need to move away from this solution within the Bayesian paradigm.
Filed under: Books, Statistics, University life Tagged: Bayes factor, frequentist inference, Jeffreys-Lindley paradox, Philosophy of Science, Québec, Université Laval
In conjunction with the recent PNAS paper on massive model choice, Rob Johnson†, Paul Kirk and Michael Stumpf published in Bioinformatics an implementation of nested sampling that is designed for biological applications, called SYSBIONS. Hence the NS for nested sampling! The C software is available on-line. (I had planned to post this news next to my earlier comments but it went under the radar…)
Filed under: Books, Statistics, University life Tagged: Bayesian model choice, bioinformatics, nested sampling, PNAS, SYSBIONS
Another Cross Validated forum question that led me to an interesting (?) reconsideration of certitudes! When simulating from a normal distribution, is Box-Muller algorithm better or worse than using the inverse cdf transform? My first reaction was to state that Box-Muller was exact while the inverse cdf relied on the coding of the inverse cdf, like qnorm() in R. Upon reflection and commenting by other members of the forum, like William Huber, I came to moderate this perspective since Box-Muller also relies on transcendental functions like sin and log, hence writing
also involves approximating in the coding of those functions. While it is feasible to avoid the call to trigonometric functions (see, e.g., Algorithm A.8 in our book), the call to the logarithm seems inescapable. So it ends up with the issue of which of the two functions is better coded, both in terms of speed and precision. Surprisingly, when coding in R, the inverse cdf may be the winner: here is the comparison I ran at the time I wrote my comments> system.time(qnorm(runif(10^8))) sutilisateur système écoulé 10.137 0.120 10.251 > system.time(rnorm(10^8)) utilisateur système écoulé 13.417 0.060 13.472`
However re-rerunning it today, I get opposite results (pardon my French, I failed to turn the messages to English):> system.time(qnorm(runif(10^8))) utilisateur système écoulé 10.137 0.144 10.274 > system.time(rnorm(10^8)) utilisateur système écoulé 7.894 0.060 7.948
(There is coherence in the system time, which shows rnorm as twice as fast as the call to qnorm.) In terms, of precision, I could not spot a divergence from normality, either through a ks.test over 10⁸ simulations or in checking the tails:
“Only the inversion method is inadmissible because it is slower and less space efficient than all of the other methods, the table methods excepted”. Luc Devroye, Non-uniform random variate generation, 1985
Update: As pointed out by Radford Neal in his comment, the above comparison is meaningless because the function rnorm() is by default based on the inversion of qnorm()! As indicated by Alexander Blocker in another comment, to use an other generator requires calling RNG as inRNGkind(normal.kind = “Box-Muller”)
(And thanks to Jean-Louis Foulley for salvaging this quote from Luc Devroye, which does not appear to apply to the current coding of the Gaussian inverse cdf.)
Filed under: Books, Kids, R, Statistics, University life Tagged: Box-Muller algorithm, cross validated, inverse cdf, logarithm, normal distribution, qnorm()