Xian's Og

Syndicate content Xi'an's Og
an attempt at bloggin, nothing more...
Updated: 2 hours 54 min ago

Le Monde puzzle [#875]

Fri, 2014-07-11 18:14

I learned something in R today thanks to Le Monde mathematical puzzle:

A two-player game consists in A picking a number n between 1 and 10 and B and A successively choosing and applying one of three transforms to the current value of n

  • n=n+1,
  • n=3n,
  • n=4n,

starting with B, until n is larger than 999. Which value(s) of n should A pick if both players act optimally?

Indeed, I first tested the following R code

sole=function(n){ if (n>999){ return(TRUE) }else{ return((!sole(3*n))&(!sole(4*n))&(!sole(n+1))) }}

which did not work because of too many calls to sole:

>sole(1) Error: evaluation nested too deeply: infinite recursion / options(expressions=)?

So I included a memory in the calls to sole so that good and bad entries of n were saved for later calls:

visit=rep(-1,1000) #not yet visited sole=function(n){ if (n>999){ return(TRUE) }else{ if (visit[n]>-1){ return(visit[n]==1) }else{ visit[n]<<-((!sole(3*n))&(!sole(4*n))& (!sole(n+1))) return(visit[n]==1) }}}

Trying frontal attack

>sole(1) Error: evaluation nested too deeply: infinite recursion / options(expressions=)?

did not work, but one single intermediary was sufficient:

> sole(500) [1] FALSE > for (i in 1:10) + print(sole(i)) [1] FALSE [1] FALSE [1] FALSE [1] TRUE [1] FALSE [1] TRUE [1] FALSE [1] FALSE [1] FALSE [1] FALSE

which means that the only winning starters for A are n=4,6. If one wants the winning moves on top, a second counter can be introduced:

visit=best=rep(-1,1000) sole=function(n){ if (n>999){ return(TRUE) }else{ if (visit[n]>-1){ return(visit[n]==1) }else{ visit[n]<<-((!sole(3*n))&(!sole(4*n))& (!sole(n+1))) if (visit[n]==0) best[n]<<-max( 3*n*(sole(3*n)), 4*n*(sole(4*n)), (n+1)*(sole(n+1))) return(visit[n]==1) }}}

From which we can deduce the next values chosen by A or B as

> best[1:10] [1] 4 6 4 -1 6 -1 28 32 36 40

(where -1 means no winning choice is possible).

Now, what is the R trick I learned from this toy problem? Simply the use of the double allocation symbol that allows to change global variables within functions. As visit and best in the latest function. (The function assign would have worked too.)

Filed under: Kids, R, Statistics, University life Tagged: assign(), global variable, Le Monde, local variable, mathematical puzzle
Categories: Bayesian Bloggers

ABC in Cancún

Thu, 2014-07-10 18:14

Here are our slides for the ABC [very] short course Jean-Michel and I give at ISBA 2014 in Cancún next Monday (if your browser can manage Slideshare…) Although I may switch the pictures from Iceland to Mexico, on Sunday, there will be not much change on those slides we both have previously used in previous short courses. (With a few extra slides borrowed from Richard Wilkinson’s tutorial at NIPS 2013!) Jean-Michel will focus his share of the course on software implementations, from R packages like abc and abctools and our population genetics software DIYABC. With an illustration on SNPs data from pygmies populations.


Filed under: Books, Kids, R, Statistics, Travel, University life Tagged: ABC, abc package, abctools package, Bayesian methodology, Cancún, DIYABC, ISBA 2014, Mexico, pygmies
Categories: Bayesian Bloggers

Bayes’ Rule [book review]

Wed, 2014-07-09 18:14

This introduction to Bayesian Analysis, Bayes’ Rule, was written by James Stone from the University of Sheffield, who contacted CHANCE suggesting a review of his book. I thus bought it from amazon to check the contents. And write a review.

First, the format of the book. It is a short paper of 127 pages, plus 40 pages of glossary, appendices, references and index. I eventually found the name of the publisher, Sebtel Press, but for a while thought the book was self-produced. While the LaTeX output is fine and the (Matlab) graphs readable, pictures are not of the best quality and the display editing is minimal in that there are several huge white spaces between pages. Nothing major there, obviously, it simply makes the book look like course notes, but this is in no way detrimental to its potential appeal. (I will not comment on the numerous appearances of Bayes’ alleged portrait in the book.)

“… (on average) the adjusted value θMAP is more accurate than θMLE.” (p.82)

Bayes’ Rule has the interesting feature that, in the very first chapter, after spending a rather long time on Bayes’ formula, it introduces Bayes factors (p.15).  With the somewhat confusing choice of calling the prior probabilities of hypotheses marginal probabilities. Even though they are indeed marginal given the joint, marginal is usually reserved for the sample, as in marginal likelihood. Before returning to more (binary) applications of Bayes’ formula for the rest of the chapter. The second chapter is about probability theory, which means here introducing the three axioms of probability and discussing geometric interpretations of those axioms and Bayes’ rule. Chapter 3 moves to the case of discrete random variables with more than two values, i.e. contingency tables, on which the range of probability distributions is (re-)defined and produces a new entry to Bayes’ rule. And to the MAP. Given this pattern, it is not surprising that Chapter 4 does the same for continuous parameters. The parameter of a coin flip.  This allows for discussion of uniform and reference priors. Including maximum entropy priors à la Jaynes. And bootstrap samples presented as approximating the posterior distribution under the “fairest prior”. And even two pages on standard loss functions. This chapter is followed by a short chapter dedicated to estimating a normal mean, then another short one on exploring the notion of a continuous joint (Gaussian) density.

“To some people the word Bayesian is like a red rag to a bull.” (p.119)

Bayes’ Rule concludes with a chapter entitled Bayesian wars. A rather surprising choice, given the intended audience. Which is rather bound to confuse this audience… The first part is about probabilistic ways of representing information, leading to subjective probability. The discussion goes on for a few pages to justify the use of priors but I find completely unfair the argument that because Bayes’ rule is a mathematical theorem, it “has been proven to be true”. It is indeed a maths theorem, however that does not imply that any inference based on this theorem is correct!  (A surprising parallel is Kadane’s Principles of Uncertainty with its anti-objective final chapter.)

All in all, I remain puzzled after reading Bayes’ Rule. Puzzled by the intended audience, as contrary to other books I recently reviewed, the author does not shy away from mathematical notations and concepts, even though he proceeds quite gently through the basics of probability. Therefore, potential readers need some modicum of mathematical background that some students may miss (although it actually corresponds to what my kids would have learned in high school). It could thus constitute a soft entry to Bayesian concepts, before taking a formal course on Bayesian analysis. Hence doing no harm to the perception of the field.

Filed under: Books, Statistics, University life Tagged: Amazon, Bayes formula, Bayes rule, Bayes theorem, Bayesian Analysis, England, introductory textbooks, publishing, short course, Thomas Bayes' portrait, tutorial
Categories: Bayesian Bloggers

impressions, soleil couchant

Wed, 2014-07-09 13:14
Categories: Bayesian Bloggers

thick disc formation scenario of the Milky Way evaluated by ABC

Tue, 2014-07-08 18:14

“The facts that the thick-disc episode lasted for several billion years, that a contraction is observed during the collapse phase, and that the main thick disc has a constant scale height with no flare argue against the formation of the thick disc through radial migration. The most probable scenario for the thick disc is that it formed while the Galaxy was gravitationally collapsing from well-mixed gas-rich giant clumps that were sustained by high turbulence, which prevented a thin disc from forming for a time, as proposed previously.”

Following discussions with astronomers from Besancon on the use of ABC methods to approximate posteriors, I was associated with their paper on assessing a formation scenario of the Milky Way, which was accepted a few weeks ago in Astronomy & Astrophysics. The central problem (was there a thin-then-thick disk?) somewhat escapes me, but this collaboration started when some of the astronomers leading the study contacted me about convergence issues with their MCMC algorithms and I realised they were using ABC-MCMC without any idea that it was in fact called ABC-MCMC and had been studied previously in another corner of the literature… The scale in the kernel was chosen to achieve an average acceptance rate of 5%-10%. Model are then compared by the combination of a log-likelihood approximation resulting from the ABC modelling and of a BIC ranking of the models.  (Incidentally, I was impressed at the number of papers published in Astronomy & Astrophysics. The monthly issue contains dozens of papers!)

Filed under: Statistics, University life Tagged: ABC, ABC-MCMC, astronomy, astrostatistics, BIC, Milky Way, thick disk, thin disk
Categories: Bayesian Bloggers

posterior predictive checks for admixture models

Mon, 2014-07-07 18:14

In a posting coincidence, just a few days after we arXived our paper on ABC model choice with random forests, where we use posterior predictive errors for assessing the variability of the random forest procedure, David Mimno, David Blei, and Barbara Engelhardt arXived a paper on posterior predictive checks to quantify lack-of-fit in admixture models of latent population structure, which deals with similar data and models, while also using the posterior predictive as a central tool. (Marginalia: the paper is a wee bit difficult to read [esp. with France-Germany playing in the airport bar!] as the modelling is only clearly described at the very end. I suspect this arXived version was put together out of a submission to a journal like Nature or PNAS, with mentions of a Methods section that does not appear here and of Supplementary Material that turned into subsections of the Discussion section.)

The dataset are genomic datasets made of SNPs (single nucleotide polymorphisms). For instance, the first (HapMap) dataset corresponds to 1,043 individuals and 468,167 SNPs. The model is simpler than Kingman’s coalescent, hence its likelihood does not require ABC steps to run inference. The admixture model in the paper is essentially a mixture model over ancestry indices with individual dependent weights with Bernoulli observations, hence resulting into a completed likelihood of the form

(which looks more formidable than it truly is!). Regular Bayesian inference is thus possible in this setting, implementing e.g. Gibbs sampling. The authors chose instead to rely on EM and thus derived the maximum likelihood estimators of the (many) parameters of the admixture. And of the latent variables z. Their posterior predictive check is based on the simulation of pseudo-observations (as in ABC!) from the above likelihood, with parameters and latent variables replaced with their EM estimates (unlike ABC). There is obviously some computational reason in doing this instead of simulating from the posterior, albeit implicit in the paper. I am however slightly puzzled by the conditioning on the latent variable estimate , as its simulation is straightforward and as a latent variable is more a missing observation than a parameter. Given those 30 to 100 replications of the data, an empirical distribution of a discrepancy function is used to assess whether or not the equivalent discrepancy for the observation is an outlier. If so, the model is not appropriate for the data. (Interestingly, the discrepancy is measured via the Bayes factor of z-scores.)

The connection with our own work is that the construction of discrepancy measures proposed in this paper could be added to our already large collection of summary statistics to check to potential impact in model comparison, i.e. for a significant contribution to the random forest nodes.  Conversely, the most significant summary statistics could then be tested as discrepancy measures. Or, more in tune with our Series B paper on the proper selection of summary variables, the distribution of those discrepancy measures could be compared across potential models. Assuming this does not take too much computing power…

Filed under: pictures, Statistics, Travel, University life Tagged: ABC model choice, arXiv, Bernoulli, goodness of fit, Human Genomics, Kingman's coalescent, mixture estimation, SNPs
Categories: Bayesian Bloggers

ABC [almost] in the front news

Sun, 2014-07-06 18:14

My friend and Warwick colleague Gareth Roberts just published a paper in Nature with Ellen Brooks-Pollock and Matt Keeling from the University of Warwick on the modelling of bovine tuberculosis dynamics in Britain and on the impact of control measures. The data comes from the Cattle Tracing System and the VetNet national testing database. The mathematical model is based on a stochastic process and its six parameters are estimated by sequential ABC (SMC-ABC). The summary statistics chosen in the model are the number of infected farms per county per year and the number of reactors (cattle failing a test) per county per year.

“Therefore, we predict that control of local badger populations and hence control of environmental transmission will have a relatively limited effect on all measures of bovine TB incidence.”

This advanced modelling of a comprehensive dataset on TB in Britain quickly got into a high profile as it addresses the highly controversial (not to say plain stupid) culling of badgers (who also carry TB) advocated by the government. The study concludes that “only generic measures such as more national testing, whole herd culling or vaccination that affect all routes of transmission are effective at controlling the spread of bovine TB.” While the elimination of badgers from the English countryside would have a limited effect.  Good news for badgers! And the Badger Trust. Unsurprisingly, the study was immediately rejected by the UK farming minister! Not only does he object to the herd culling solution for economic reasons, but he “cannot accept the paper’s findings”. Maybe he does not like ABC… More seriously, the media oversimplified the findings of the study, “as usual”, with e.g. The Guardian headline of “tuberculosis threat requires mass cull of cattle”.

Filed under: pictures, Statistics, University life Tagged: ABC, Badger Trust, badgers, Britain, cattle, cows, England, epidemiology, media, SMC-ABC, summary statistics, TB, The Guardian, tuberculosis, University of Warwick
Categories: Bayesian Bloggers

how to translate evidence into French?

Sat, 2014-07-05 18:14

I got this email from Gauvain who writes a PhD in philosophy of sciences a few minutes ago:

L’auteur du texte que j’ai à traduire désigne les facteurs de Bayes comme une “Bayesian measure of evidence”, et les tests de p-value comme une “frequentist measure of evidence”. Je me demandais s’il existait une traduction française reconnue et établie pour cette expression de “measure of evidence”. J’ai rencontré parfois “mesure d’évidence” qui ressemble fort à un anglicisme, et parfois “estimateur de preuve”, mais qui me semble pouvoir mener à des confusions avec d’autres emploi du terme “estimateur”.

which (pardon my French!) wonders how to translate the term evidence into French. It would sound natural that the French évidence is the answer but this is not the case. Despite sharing the same Latin root (evidentia), since the English version comes from medieval French, the two words have different meanings: in English, it means a collection of facts coming to support an assumption or a theory, while in French it means something obvious, which truth is immediately perceived. Surprisingly, English kept the adjective evident with the same [obvious] meaning as the French évident. But the noun moved towards a much less definitive meaning, both in Law and in Science. I had never thought of the huge gap between the two meanings but must have been surprised at its use the first time I heard it in English. But does not think about it any longer, as when I reviewed Seber’s Evidence and Evolution.

One may wonder at the best possible translation of evidence into French. Even though marginal likelihood (vraisemblance marginale) is just fine for statistical purposes. I would suggest faisceau de présomptions or degré de soutien or yet intensité de soupçon as (lengthy) solutions. Soupçon could work as such, but has a fairly negative ring…

Filed under: Books, Statistics, University life Tagged: Bayes factors, evidence, Evidence and Evolution, French vs. English, obvious, translation
Categories: Bayesian Bloggers

voices – strange shores [book reviews]

Fri, 2014-07-04 18:14

Following my recent trip to Iceland, I read two more books by Arnaldur Indriðason, Voices (Röddin, 2003) and Strange Shores (Furðustrandir, 2010).

As usual, Indriðason’s books are more about the past (of characters as well as of the whole country) than about current times. Voices does not switch from this pattern, the more because it is one of the earliest Inspector Erlendur’s books. Besides the murder of an hotel employee at the fringe of homelessness, lies the almost constant questioning in Indriðason’s books of the difficult or even impossible relations between parents and children and/or between siblings, and of the long-lasting consequences of this generation gap. The murder iitself is but a pretext to investigations on that theme and the murder resolution is far from the central point of the book. The story itself is thus less compelling than others I have read, maybe because the main character spends so much time closeted in his hotel room. But it nonetheless fits well within the Erlendur series. And although it is unrelated with the story, the cover reminded me very much of the Gullfoss waterfalls.

The second book, Strange Shores, is the farthest to a detective stories in the whole series. Indeed, Erlendur is back to his childhood cottage in Eastern Iceland, looking for a resolution of his childhood trauma, loosing his younger brother during a snowstorm. He also investigates another snowstorm disappearance, interrogating the few survivors and reluctant witnesses from that time. Outside any legal mandate. Sometimes very much outside! While the story is not completely plausible, both in the present and in the past, it remains a striking novel, even on its own. (Although it could read better after the earlier novels in the series.) Not only the resolution of the additional disappearance brings additional pain and no comfort to those involved, but the ending of Erlendur’s own quest is quite ambiguous. As the book reaches its final pages, I could not decide if he had reached redemption and deliverance and the potential to save his own children, or he was beyond redemption, reaching another circle of Hell. As explained by the author in an interview, this is intentional and not not the consequence of my poor understanding: ” Readers of Strange Shores are not quite certain what to make of the ending regarding Erlendur, and I’m quite happy to leave them in the dark!”. If the main character of this series focussing more on missing persons than on detective work, what’s next?!


Filed under: Books, Mountains, Travel Tagged: Arnaldur Indriðason, Furðustrandir, Iceland, Iceland noir, mountain storm, Röddin, Reykjavik
Categories: Bayesian Bloggers

vector quantile regression

Thu, 2014-07-03 18:14

My Paris-Dauphine colleague Guillaume Carlier recently arXived a statistics paper entitled Vector quantile regression, co-written with Chernozhukov and Galichon. I was most curious to read the paper as Guillaume is primarily a mathematical analyst working on optimisation problems like optimal transport. And also because I find quantile regression difficult to fathom as a statistical problem. (As it happens, both his co-authors are from econometrics.) The results in the paper are (i) to show that a d-dimensional (Lebesgue) absolutely continuous random variable Y can always be represented as the deterministic transform Y=Q(U), where U is a d-dimensional [0,1] uniform (the paper expresses this transform as conditional on a set of regressors Z, but those essentially play no role) and Q is monotonous in the sense of being the gradient of a convex function,


(ii) to deduce from this representation a unique notion of multivariate quantile function; and (iii) to consider the special case when the quantile function Q can be written as the linear

where β(U) is a matrix. Hence leading to an estimation problem.

While unsurprising from a measure theoretic viewpoint, the representation theorem (i) is most interesting both for statistical and simulation reasons. Provided the function Q can be easily estimated and derived, respectively. The paper however does not provide a constructive tool for this derivation, besides indicating several characterisations as solutions of optimisation problems. From a statistical perspective, a non-parametric estimation of  β(.) would have useful implications in multivariate regression, although the paper only considers the specific linear case above. Which solution is obtained by a discretisation of all variables and  linear programming.

Filed under: pictures, Statistics, University life Tagged: CEREMADE, Monte Carlo methods, multivariate quantile, nonparametrics, Paris, quantile regression, simulation, Université Paris Dauphine
Categories: Bayesian Bloggers

straightforward statistics [book review]

Wed, 2014-07-02 18:14

“I took two different statistics courses as an undergraduate psychology major [and] four different advanced statistics classes as a PhD student.” G. Geher

Straightforward Statistics: Understanding the Tools of Research by Glenn Geher and Sara Hall is an introductory textbook for psychology and other social science students. (That Oxford University Press sent me for review. Nice cover, by the way!) I can spot the purpose behind the title, purpose heavily stressed anew in the preface and the first chapter, but it nonetheless irks me as conveying the message that one semester of reasonable diligence in class will suffice to any college students to “not only understanding research findings from psychology, but also to uncovering new truths about the world and our place in it” (p.9). Nothing less. While, in essence, it covers the basics found in all introductory textbooks, from descriptive statistics to ANOVA models. The inclusion of “real research examples” in the chapters of the book rather demonstrates how far from real research a reader of the book would stand…

“However, as often happen in life, we can be wrong…” (p.66)

The book aims at teaching basic statistics to “undergraduate students who are afraid of math” (p.xiii). By using “an accessible, accurate, coherent, and engaging presentation of statistics” (p.xiv). And reducing the maths expressions to a bare minimum. Unfortunately the very first formula (p.19) is meaningless (skipping the individual indices in sums is the rule throughout the book)

and the second one (Table 2.7, p.22 and again Tables 2.19 and 2.20, p.43)

is (a) missing both the indices and the summation symbol and (b) dividing the sum of “squared deviation scores” by N rather than the customary N-1.  I also fail to see the point of providing histograms for categorical variables with only two modalities, like “Hungry” and “Not Hungry” (Fig. 2.11, p.47)…

“Statisticians never prove anything-thereby making prove something of a dirty word.” (p.116)

As I only teach math students, I cannot judge how adequate the textbook is for psychology or other social science students. It however sounds highly verbose to me, in its attempts to bypass maths formulas. For instance, the 15 pages of the chapter on standardised scores are about moving back and forth between the raw data and its standardised version


Or the two pages (pp.71-72) of motivations on the r coefficient before the (again meaningless) formula

which even skips indices of the z-scores to avoid frightening the students. (The book also asserts that a correllation of zero “corresponds to no mathematical relationship between [the] two variables whatsoever”, p.70.) Or yet the formula for (raw-score) regression (p.97) given as

where and

without defining B. Which is apparently a typo as the standardised regression used β… I could keep going with such examples but the point I want to make is that, if the authors want to reach students that have fundamental problems with a formula like

which does  not appear in the book, they could expose them to the analysis and understanding of the outcome of statistical software rather than spending a large part of the book on computing elementary quantities like the coefficients of a simple linear regression by hand. Instead, fundamental notions like multivariate regression is relegated to an appendix (E) as “Advanced statistics to be aware of”. Plus a two page discussion (pp.104-105) of a study conducted by the first author on predicting preference for vaginal sex. (To put things into context, the first author also wrote Mating Intelligence Unleashed. Which explains for some unusual “real research examples”.)

Another illustration of what I consider as the wrong focus of the book is provided by the introduction to (probability and) the normal distribution in Chapter 6, which dedicates most of the pages to reading the area under the normal density from a normal table without even providing the formula of this density. (And with an interesting typo in Figure 6.4.) Indeed, as in last century textbooks, the book does include probability tables for standard tests. Rather than relying on software and pocket calculators to move on to the probabilistic interpretation of p-values. And the multi-layered caution that is necessary when handling hypotheses labelled as significant. (A caution pushed to its paroxysm in The Cult of Significance I reviewed a while ago.) The book includes a chapter on power but, besides handling coordinate axes in a weird manner (check from Fig. 9.5 onwards) and repeating everything twice for left- and right-one-sided hypotheses!, it makes the computation of power appear like the main difficulty when it is its interpretation that is most delicate and fraught with danger. Were I to teach (classical) testing to math-adverse undergrads, and I may actually have to next year!, I would skip the technicalities and pile up cases and counter-cases explaining why p-values and power are not the end of the analysis. (Using Andrew’s blog as a good reservoir for such cases, as illustrated by his talk in Chamonix last January!) But I did not see any warning in that book on the dangers of manipulating data, formulating hypotheses to test out of the data, running multiple tests with no correction and so on.

To conclude on this extensive review, I, as an outsider, fail to see redeeming features that would single Straightforward Statistics: Understanding the Tools of Research as a particularly enticing textbook. The authors have clearly put a lot of efforts into their book, adopted what they think is the most appropriate tone to reach to the students, and added very detailed homeworks and their solution. Still, this view makes statistics sounds too straightforward and leads to the far too common apprehension of p-values as the ultimate assessment for statistical significance, without opening for alternatives such as outliers and model misspecification.

Filed under: Books, Kids, Statistics, University life Tagged: hypothesis testing, introductory textbooks, multiple tests, Oxford University Press, p-values, power, psychology, tests
Categories: Bayesian Bloggers

recycling accept-reject rejections (#2)

Tue, 2014-07-01 18:14

Following yesterday’s post on Rao’s, Liu’s, and Dunson’s paper on a new approach to intractable normalising constants, and taking advantage of being in Warwick, I tested the method on a toy model, namely the posterior associated with n Student’s t observations with unknown location parameter μ and a flat prior,

which is “naturally” bounded by a Cauchy density with scale √ν. The constant M is then easily derived and running the new algorithm follows from a normal random walk proposal targeting the augmented likelihood (R code below).

As shown by the above graph, the completion-by-rejection scheme produces a similar outcome (tomato) as the one based on the sole observations (steelblue). With a similar acceptance rate. However, the computing time is much much degraded:

> system.time(g8()) user system elapsed 53.751 0.056 54.103 > system.time(g9()) user system elapsed 1.156 0.000 1.161

when compared with the no-completion version. Here is the entire R code that produced both MCMC samples:

#Student t observations and flat prior nu=4 n=25 M=pi*sqrt(nu) sqrtnu=sqrt(nu) obs=rt(n,df=4) sdobs=sd(obs) #unormalised t mydt=function(x,mu){ return(dt(x-mu,df=nu)/dt(0,df=nu))} mydtc=cmpfun(mydt) mydcauchy=function(x,mu){ y=(x-mu)/sqrtnu return(dcauchy(y)/sqrtnu)} mydcaucchy=cmpfun(mydcauchy) #augmented data augmen=function(mu){ y=NULL for (i in 1:n){ prop=mu+rcauchy(1)*sqrtnu reject=(runif(1)<mydtc(prop,mu)/(M*mydcaucchy(prop,mu))) while (!reject){ y=c(y,prop) prop=mu+rcauchy(1)*sqrtnu reject=(runif(1)<mydtc(prop,mu)/(M*mydcaucchy(prop,mu)))} } return(y)} #Gibbs gibbsda=function(T=10^4){ theta=rep(0,T) for (t in 2:T){ rej=augmen(theta[t-1]) theta[t]=prop=theta[t-1]+rnorm(1,sd=.1*sdobs) propdens=sum(dt(obs-prop,df=nu,log=TRUE))+ sum(log(mydcaucchy(rej,prop)-mydtc(rej,mu=prop)/M)) refdens=sum(dt(obs-theta[t-1],df=nu,log=TRUE))+ sum(log(mydcaucchy(rej,theta[t-1])-mydtc(rej,mu=theta[t-1])/M)) if (log(runif(1))>propdens-refdens) theta[t]=theta[t-1] } return(theta)} g8=cmpfun(gibbsda) gibbs2=function(T=10^4){ eta=rep(0,T) for (t in 2:T){ eta[t]=prop=eta[t-1]+rnorm(1,sd=sdobs) propdens=sum(dt(obs-prop,df=nu,log=TRUE)) refdens=sum(dt(obs-eta[t-1],df=nu,log=TRUE)) if (log(runif(1))>propdens-refdens) eta[t]=eta[t-1] } return(eta)} g9=cmpfun(gibbsda)
Filed under: R, Statistics, University life Tagged: accept-reject algorithm, compiler, Data augmentation, Gibbs sampling, MCMC, Monte Carlo Statistical Methods, Student's t distribution
Categories: Bayesian Bloggers

recycling accept-reject rejections

Mon, 2014-06-30 18:14

Vinayak Rao, Lizhen Lin and David Dunson just arXived a paper which proposes anew technique to handle intractable normalising constants. And which exact title is Data augmentation for models based on rejection sampling. (Paper that I read in the morning plane to B’ham, since this is one of my weeks in Warwick.) The central idea therein is that, if the sample density (aka likelihood) satisfies

where all terms but p are known in closed form, then completion by the rejected values of an hypothetical accept-reject algorithm−hypothetical in the sense that the data does not have to be produced by an accept-reject scheme but simply the above domination condition to hold−allows for a data augmentation scheme. Without requiring the missing normalising constant. Since the completed likelihood is

A closed-form, if not necessarily congenial, function.

Now this is quite a different use of the “rejected values” from the accept reject algorithm when compared with our 1996 Biometrika paper on the Rao-Blackwellisation of accept-reject schemes (which, still, could have been mentioned there… Or Section 4.2 of Monte Carlo Statistical Methods. Rather than re-deriving the joint density of the augmented sample, “accepted+rejected”.)

It is a neat idea in that it completely bypasses the approximation of the normalising constant. And avoids the somewhat delicate tuning of the auxiliary solution of Moller et al. (2006)  The difficulty with this algorithm is however in finding an upper bound M on the unnormalised density f that is

  1. in closed form;
  2. with a manageable and tight enough “constant” M;
  3. compatible with running a posterior simulation conditional on the added rejections.

The paper seems to assume further that the bound M is independent from the current parameter value θ, at least as suggested by the notation (and Theorem 2), but this is not in the least necessary for the validation of the formal algorithm. Such a constraint would pull M higher, hence reducing the efficiency of the method. Actually the matrix Langevin distribution considered in the first example involves a bound that depends on the parameter κ.

The paper includes a result (Theorem 2) on the uniform ergodicity that relies on heavy assumptions on the proposal distribution. And a rather surprising one, namely that the probability of rejection is bounded from below, i.e. calling for a less efficient proposal. Now it seems to me that a uniform ergodicity result holds as well when the probability of acceptance is bounded from below since, then, the event when no rejection occurs constitutes an atom from the augmented Markov chain viewpoint. There therefore occurs a renewal each time the rejected variable set ϒ is empty, and ergodicity ensues (Robert, 1995, Statistical Science).

Note also that, despite the opposition raised by the authors, the method per se does constitute a pseudo-marginal technique à la Andrieu-Roberts (2009) since the independent completion by the (pseudo) rejected variables produces an unbiased estimator of the likelihood. It would thus be of interest to see how the recent evaluation tools of Andrieu and Vihola can assess the loss in efficiency induced by this estimation of the likelihood.

Maybe some further experimental evidence tomorrow…

Filed under: Statistics, University life Tagged: accept-reject algorithm, arXiv, auxiliary variable, Data augmentation, George Casella, intractable likelihood, Monte Carlo Statistical Methods, Rao-Blackwellisation, recycling, untractable normalizing constant
Categories: Bayesian Bloggers

R/Rmetrics in Paris [alas!]

Sun, 2014-06-29 18:14

Today I gave a talk on Bayesian model choice in a fabulous 13th Century former monastery in the Latin Quarter of Paris… It is the Collège des Bernardins, close to Jussieu and Collège de France, unbelievably hidden to the point I was not aware of its existence despite having studied and worked in Jussieu since 1982… I mixed my earlier San Antonio survey on importance sampling approximations to Bayes factors with an entry to our most recent work on ABC with random forests. This was the first talk of the 8th R/Rmetrics workshop taking place in Paris this year. (Rmetrics is aiming at aggregating R packages with econometrics and finance applications.) And I had a full hour and a half to deliver my lecture to the workshop audience. Nice place, nice people, new faces and topics (and even andouille de Vire for lunch!): why should I complain with an alas in the title?!What happened is that the R/Rmetrics meetings have been till this year organised in Meielisalp, Switzerland. Which stands on top of Thuner See and… just next to the most famous peaks of the Bernese Alps! And that I had been invited last year but could not make it… Meaning I lost a genuine opportunity to climb one of my five dream routes, the Mittelegi ridge of the Eiger. As the future R/Rmetrics meetings will not take place there.

A lunch discussion at the workshop led me to experiment the compiler library in R, library that I was unaware of. The impact on the running time is obvious: recycling the fowler function from the last Le Monde puzzle,

> bowler=cmpfun(fowler) > N=20;n=10;system.time(fowler(pred=N)) user system elapsed 52.647 0.076 56.332 > N=20;n=10;system.time(bowler(pred=N)) user system elapsed 51.631 0.004 51.768 > N=20;n=15;system.time(bowler(pred=N)) user system elapsed 51.924 0.024 52.429 > N=20;n=15;system.time(fowler(pred=N)) user system elapsed 52.919 0.200 61.960

shows a ten- to twenty-fold gain in system time, if not in elapsed time (re-alas!).

Filed under: Mountains, pictures, R, Statistics, Travel, University life Tagged: ABC, andouille de Vire, Bayesian econometrics, Bayesian model choice, Bernese Alps, cmpfun(), Collège des Bernardins, compiler, Eiger, importance sampling, Interlaken, Meielisalp, Mittelegi ridge, Paris, R, Rmetrics, San Antonio, Switzerland, system.time, Thun Lake
Categories: Bayesian Bloggers

…et sinon Mr Cardoso réussit là où les ont échoué!

Sun, 2014-06-29 08:18

A similar flier, a few days later. With very precise (if incoherent) guarantees! And a fantastic use of capitals. Too bad Monsieur Cardoso could not predict the (occurrence of a) missing noun in the last sentence…

Filed under: Kids, pictures Tagged: advertising, charlatanism, mailbox, Pantin, Paris
Categories: Bayesian Bloggers

Prof. Ntayiya résout tous vos problèmes!

Sat, 2014-06-28 18:14

One of the numerous fliers for “occult expertise” that end up in my mailbox… An interesting light on the major woes of my neighbours. That can induce some to consult with such charlatans. And a wonder: how can Professeur Ntayiya hope to get paid if the effects need to be proven?!

Filed under: pictures Tagged: black block, charlatanism, fliers, Fontenay-aux-Roses, mailbox
Categories: Bayesian Bloggers