Bayesian News Feeds

corrected MCMC samplers for multivariate probit models

Xian's Og - Tue, 2015-05-05 18:15

“Moreover, IvD point out an error in Nobile’s derivation which can alter its stationary distribution. Ironically, as we shall see, the algorithms of IvD also contain an error.”

 Xiyun Jiao and David A. van Dyk arXived a paper correcting an MCMC sampler and R package MNP for the multivariate probit model, proposed by Imai and van Dyk in 2005. [Hence the abbreviation IvD in the above quote.] Earlier versions of the Gibbs sampler for the multivariate probit model by Rob McCulloch and Peter Rossi in 1994, with a Metropolis update added by Agostino Nobile, and finally an improved version developed by Imai and van Dyk in 2005. As noted in the above quote, Jiao and van Dyk have discovered two mistakes in this latest version, jeopardizing the validity of the output.

The multivariate probit model considered here is a multinomial model where the occurrence of the k-th category is represented as the k-th component of a (multivariate) normal (correlated) vector being the largest of all components. The latent normal model being non-identifiable since invariant by either translation or scale, identifying constraints are used in the literature. This means using a covariance matrix of the form Σ/trace(Σ), where Σ is an inverse Wishart random matrix. In their 2005 implementation, relying on marginal data augmentation—which essentially means simulating the non-identifiable part repeatedly at various steps of the data augmentation algorithm—, Imai and van Dyk missed a translation term and a constraint on the simulated matrices that lead to simulations outside the rightful support, as illustrated from the above graph [snapshot from the arXived paper].

Since the IvD method is used in many subsequent papers, it is quite important that these mistakes are signalled and corrected. [Another snapshot above shows how much both algorithm differ!] Without much thinking about this, I [thus idly] wonder why an identifying prior is not taking the place of a hard identifying constraint, as it should solve the issue more nicely. In that it would create less constraints and more entropy (!) in exploring the augmented space, while theoretically providing a convergent approximation of the identifiable parts. I may (must!) however miss an obvious constraint preventing this implementation.


Filed under: Books, pictures, R, Statistics, University life Tagged: Bayesian modelling, Data augmentation, identifiability, Journal of Econometrics, MNP package, multivariate probit model, probit model, R, Wishart distribution
Categories: Bayesian Bloggers

take those hats off [from R]!

Xian's Og - Mon, 2015-05-04 18:15

This is presumably obvious to most if not all R programmers, but I became aware today of a hugely (?) delaying tactic in my R codes. I was working with Jean-Michel and Natesh [who are visiting at the moment] and when coding an MCMC run I was telling them that I usually preferred to code Nsim=10000 as Nsim=10^3 for readability reasons. Suddenly, I became worried that this representation involved a computation, as opposed to Nsim=1e3 and ran a little experiment:

> system.time(for (t in 1:10^8) x=10^3) utilisateur système écoulé 30.704 0.032 30.717 > system.time(for (t in 1:1e8) x=10^3) utilisateur système écoulé 30.338 0.040 30.359 > system.time(for (t in 1:10^8) x=1000) utilisateur système écoulé 6.548 0.084 6.631 > system.time(for (t in 1:1e8) x=1000) utilisateur système écoulé 6.088 0.032 6.115 > system.time(for (t in 1:10^8) x=1e3) utilisateur système écoulé 6.134 0.029 6.157 > system.time(for (t in 1:1e8) x=1e3) utilisateur système écoulé 6.627 0.032 6.654 > system.time(for (t in 1:10^8) x=exp(3*log(10))) utilisateur système écoulé 60.571 0.000 57.103

 So using the usual scientific notation with powers is taking its toll! While the calculator notation with e is cost free… Weird!

I understand that the R notation 10^6 is an abbreviation for a power function that can be equally applied to pi^pi, say, but still feel aggrieved that a nice scientific notation like 10⁶ ends up as a computing trap! I thus asked the question to the Stack Overflow forum, getting the (predictable) answer that the R code 10^6 meant calling the R power function, while 1e6 was a constant. Since 10⁶ does not differ from ππ, there is no reason 10⁶ should be recognised by R as a million. Except that it makes my coding more coherent.

> system.time( for (t in 1:10^8) x=pi^pi) utilisateur système écoulé 44.518 0.000 43.179 > system.time( for (t in 1:10^8) x=10^6) utilisateur système écoulé 38.336 0.000 37.860

Another thing I discovered from this answer to my question is that negative integers are also requesting call to a function:

> system.time( for (t in 1:10^8) x=1) utilisateur système écoulé 10.561 0.801 11.062 > system.time( for (t in 1:10^8) x=-1) utilisateur système écoulé 22.711 0.860 23.098

This sounds even weirder.


Filed under: Books, Kids, R, Statistics, University life Tagged: exponent notation, exponentiation, functions in R, mantissa, power, R, scientific notation, system.time
Categories: Bayesian Bloggers

moonlight

Xian's Og - Mon, 2015-05-04 13:18


Filed under: pictures, Running Tagged: full moon, Sceaux
Categories: Bayesian Bloggers

ABC and cosmology

Xian's Og - Sun, 2015-05-03 18:15

Two papers appeared on arXiv in the past two days with the similar theme of applying ABC-PMC [one version of which we developed with Mark Beaumont, Jean-Marie Cornuet, and Jean-Michel Marin in 2009] to cosmological problems. (As a further coincidence, I had just started refereeing yet another paper on ABC-PMC in another astronomy problem!) The first paper cosmoabc: Likelihood-free inference via Population Monte Carlo Approximate Bayesian Computation by Ishida et al. [“et al” including Ewan Cameron] proposes a Python ABC-PMC sampler with applications to galaxy clusters catalogues. The paper is primarily a description of the cosmoabc package, including code snapshots. Earlier occurrences of ABC in cosmology are found for instance in this earlier workshop, as well as in Cameron and Pettitt earlier paper. The package offers a way to evaluate the impact of a specific distance, with a 2D-graph demonstrating that the minimum [if not the range] of the simulated distances increases with the parameters getting away from the best parameter values.

“We emphasis [sic] that the choice of the distance function is a crucial step in the design of the ABC algorithm and the reader must check its properties carefully before any ABC implementation is attempted.” E.E.O. Ishida et al.

The second [by one day] paper Approximate Bayesian computation for forward modelling in cosmology by Akeret et al. also proposes a Python ABC-PMC sampler, abcpmc. With fairly similar explanations: maybe both samplers should be compared on a reference dataset. While I first thought the description of the algorithm was rather close to our version, including the choice of the empirical covariance matrix with the factor 2, it appears it is adapted from a tutorial in the Journal of Mathematical Psychology by Turner and van Zandt. One out of many tutorials and surveys on the ABC method, of which I was unaware, but which summarises the pre-2012 developments rather nicely. Except for missing Paul Fearnhead’s and Dennis Prangle’s semi-automatic Read Paper. In the abcpmc paper, the update of the covariance matrix is the one proposed by Sarah Filippi and co-authors, which includes an extra bias term for faraway particles.

“For complex data, it can be difficult or computationally expensive to calculate the distance ρ(x; y) using all the information available in x and y.” Akeret et al.

In both papers, the role of the distance is stressed as being quite important. However, the cosmoabc paper uses an L1 distance [see (2) therein] in a toy example without normalising between mean and variance, while the abcpmc paper suggests using a Mahalanobis distance that turns the d-dimensional problem into a comparison of one-dimensional projections.


Filed under: Books, pictures, Statistics, University life Tagged: ABC, ABC-PMC, abcpmc, astronomy, astrostatistics, cosmoabc, cosmology, likelihood-free methods, Mahalanobis distance, Python, semi-automatic ABC
Categories: Bayesian Bloggers

the 39 steps

Xian's Og - Sat, 2015-05-02 18:15

I had never read this classic that inspired Hitchcock’s 39 steps (which I neither watched before).  The setting of the book is slightly different from the film: it takes place in England and Scotland a few weeks before the First  World War. German spies are trying to kill a prominent Greek politician [no connection with the current Euro-crisis intended!] and learn about cooperative plans between France and Britain. The book involves no woman character (contrary to the film, where it adds a comical if artificial level). As in Rogue Male, most of the story is about an unlikely if athletic hero getting into the way of those spies and being pursued through the countryside by those spies. Even though the hunt has some intense moments, it lacks the psychological depth of Rogue Male, while the central notion that those spies are so good that they can play other persons’ roles without being recognised is implausible to the extreme, a feature reminding me of the Blake & Mortimer cartoons which may have been inspired by this type of books. Especially The Francis Blake Affair. (Trivia: John Buchan ended up Governor General of Canada.)


Filed under: Books, Kids, Mountains, Running, Travel Tagged: Alfred Hitchcock, Blake and Mortimer, England, first World War, Glencoe, Rogue Male, Scotland, The 39 steps
Categories: Bayesian Bloggers

the raven, the cormorant, and the heron

Xian's Og - Fri, 2015-05-01 18:15

This morning, on my first lap of the Grand Bassin in Parc de Sceaux, I spotted “the” heron standing at its usual place, on the artificial wetland island created at one end of the canal. When coming back to this spot during the second lap, I could hear the heron calling loudly and saw a raven repeatedly diving near it and a nearby cormorant, who also seemed unhappy with this attitude, judging from the flapping of its wings… After a few dozens of those dives, the raven landed at the other end of the island and this was the end of the canal drama! Unless there was a dead fish landed there, I wonder why the raven was having a go at those two larger birds.


Filed under: Kids, pictures, Running Tagged: cormorant, crow, heron, morning light, morning run, Parc de Sceaux, raven
Categories: Bayesian Bloggers

Le Monde puzzle [#909]

Xian's Og - Thu, 2015-04-30 18:15

Another of those “drop-a-digit” Le Monde mathematical puzzle:

Find all integers n with 3 or 4 digits, no exterior zero digit, and a single interior zero digit, such that removing that zero digit produces a divider of x.

As in puzzle #904, I made use of the digin R function:

digin=function(n){ as.numeric(strsplit(as.character(n),"")[[1]])}

and simply checked all integers up to 10⁶:

plura=divid=NULL for (i in 101:10^6){ dive=rev(digin(i)) if ((min(dive[1],rev(dive)[1])>0)& (sum((dive[-c(1,length(dive))]==0))==1)){ dive=dive[dive>0] dive=sum(dive*10^(0:(length(dive)-1))) if (i==((i%/%dive)*dive)){ plura=c(plura,i) divid=c(divid,dive)}}}

which leads to the output

> plura 1] 105 108 405 2025 6075 10125 30375 50625 70875 > plura/divid [1] 7 6 9 9 9 9 9 9 9

leading to the conclusion there is no solution beyond 70875. (Allowing for more than a single zero within the inner digits sees many more solutions.)


Filed under: Books, Kids, R Tagged: integers, Le Monde, mathematical puzzle
Categories: Bayesian Bloggers

the most patronizing start to an answer I have ever received

Xian's Og - Wed, 2015-04-29 18:15

Another occurrence [out of many!] of a question on X validated where the originator (primitivus petitor) was trying to get an explanation without the proper background. On either Bayesian statistics or simulation. The introductory sentence to the question was about “trying to understand how the choice of priors affects a Bayesian model estimated using MCMC” but the bulk of the question was in fact failing to understand an R code for a random-walk Metropolis-Hastings algorithm for a simple regression model provided in a introductory blog by Florian Hartig. And even more precisely about confusing the R code dnorm(b, sd = 5, log = T) in the prior with rnorm(1,mean=b, sd = 5, log = T) in the proposal…

“You should definitely invest some time in learning the bases of Bayesian statistics and MCMC methods from textbooks or on-line courses.” X

So I started my answer with the above warning. Which sums up my feelings about many of those X validated questions, namely that primitivi petitores lack the most basic background to consider such questions. Obviously, I should not have bothered with an answer, but it was late at night after a long day, a good meal at the pub in Kenilworth, and a broken toe still bothering me. So I got this reply from the primitivus petitor that it was a patronizing piece of advice and he prefers to learn from R code than from textbooks and on-line courses, having “looked through a number of textbooks”. Good luck with this endeavour then!


Filed under: Books, Kids, R, Statistics, University life Tagged: Bayesian statistics, cross validated, dnorm, MCMC, Metropolis-Hastings algorithm, Monte Carlo Statistical Methods, R
Categories: Bayesian Bloggers

a war[like] week

Xian's Og - Tue, 2015-04-28 18:15

This week in Warwick was one of the busiest ones ever as I had to juggle between two workshops, including one in Oxford, a departmental meeting, two paper revisions, two pre-vivas, and a seminar in Leeds. Not to mention a broken toe (!), a flat tire (!!), and a diner at the X. Hardly anytime for writing blog entries..! Fortunately, I managed to squeeze time for working with Kerrie Mengersen who was visiting Warwick this fortnight. Finding new directions for the (A)BCel approach we developed a few years ago with Pierre Pudlo. The workshop in Oxford was quite informal with talks from PhD students [I fear I cannot discuss here as the papers are not online yet]. And one talk by François Caron about estimating sparse networks with not exactly exchangeable priors and completely random measures. And one talk by Kerrie Mengersen on a new and in-progress approach to handling Big Data that I found quite convincing (if again cannot discuss here). The probabilistic numerics workshop was discussed in yesterday’s post and I managed to discuss it a wee bit further with the organisers at The X restaurant in Kenilworth. (As a superfluous aside, and after a second sampling this year, I concluded that the Michelin star somewhat undeserved in that the dishes at The X are not particularly imaginative or tasty, the excellent sourdough bread being the best part of the meal!) I was expecting the train ride to Leeds to be highly bucolic as it went through the sunny countryside of South Yorkshire, with newly born lambs running in the bright green fields surrounded by old stone walls…, but instead went through endless villages with their rows of brick houses. Not that I have anything against brick houses, mind! Only, I had not realised how dense this part of England was, this presumably getting back all the way to the Industrial Revolution with the Manchester-Leeds-Birmingham triangle.

My seminar in Leeds was as exciting as in Amsterdam last week and with a large audience, so I got many and only interesting questions, from the issue of turning the output (i.e., the posterior on α) into a decision rule, to making  decision in the event of a non-conclusive posterior, to links with earlier frequentist resolutions, to whether or not we were able to solve the Lindley-Jeffreys paradox (we are not!, which makes a lot of sense), to the possibility of running a subjective or a sequential version. After the seminar I enjoyed a perfect Indian dinner at Aagrah, apparently a Yorkshire institution, with the right balance between too hot and too mild, i.e., enough spices to break a good sweat but not too many to loose any sense of taste!


Filed under: Books, Kids, pictures, Running, Statistics, Travel, University life, Wines Tagged: Aagrah restaurants, ABCel, empirical likelihood, England, Kenilworth, Leeds, Oxford (Mississipi), The Cross, University of Oxford, University of Warwick, Yorkshire
Categories: Bayesian Bloggers

extending ABC to high dimensions via Gaussian copula

Xian's Og - Mon, 2015-04-27 18:15

Li, Nott, Fan, and Sisson arXived last week a new paper on ABC methodology that I read on my way to Warwick this morning. The central idea in the paper is (i) to estimate marginal posterior densities for the components of the model parameter by non-parametric means; and (ii) to consider all pairs of components to deduce the correlation matrix R of the Gaussian (pdf) transform of the pairwise rank statistic. From those two low-dimensional estimates, the authors derive a joint Gaussian-copula distribution by using inverse  pdf transforms and the correlation matrix R, to end up with a meta-Gaussian representation

where the η’s are the Gaussian transforms of the inverse-cdf transforms of the θ’s,that is,

Or rather

given that the g’s are estimated.

This is obviously an approximation of the joint in that, even in the most favourable case when the g’s are perfectly estimated, and thus the components perfectly Gaussian, the joint is not necessarily Gaussian… But it sounds quite interesting, provided the cost of running all those transforms is not overwhelming. For instance, if the g’s are kernel density estimators, they involve sums of possibly a large number of terms.

One thing that bothers me in the approach, albeit mostly at a conceptual level for I realise the practical appeal is the use of different summary statistics for approximating different uni- and bi-dimensional marginals. This makes for an incoherent joint distribution, again at a conceptual level as I do not see immediate practical consequences… Those local summaries also have to be identified, component by component, which adds another level of computational cost to the approach, even when using a semi-automatic approach as in Fernhead and Prangle (2012). Although the whole algorithm relies on a single reference table.

The examples in the paper are (i) the banana shaped “Gaussian” distribution of Haario et al. (1999) that we used in our PMC papers, with a twist; and (ii) a g-and-k quantile distribution. The twist in the banana (!) is that the banana distribution is the prior associated with the mean of a Gaussian observation. In that case, the meta-Gaussian representation seems to hold almost perfectly, even in p=50 dimensions. (If I remember correctly, the hard part in analysing the banana distribution was reaching the tails, which are extremely elongated in at least one direction.) For the g-and-k quantile distribution, the same holds, even for a regular ABC. What seems to be of further interest would be to exhibit examples where the meta-Gaussian is clearly an approximation. If such cases exist.


Filed under: Books, pictures, Statistics, Travel, Uncategorized, University life Tagged: ABC, copulas, population Monte Carlo, quantile distribution
Categories: Bayesian Bloggers

probabilistic numerics

Xian's Og - Sun, 2015-04-26 18:15

I attended an highly unusual workshop while in Warwick last week. Unusual for me, obviously. It was about probabilistic numerics, i.e., the use of probabilistic or stochastic arguments in the numerical resolution of (possibly) deterministic problems. The notion in this approach is fairly Bayesian in that it makes use to prior information or belief about the quantity of interest, e.g., a function, to construct an usually Gaussian process prior and derive both an estimator that is identical to a numerical method (e.g., Runge-Kutta or trapezoidal integration) and uncertainty or variability around this estimator. While I did not grasp much more than the classy introduction talk by Philipp Hennig, this concept sounds fairly interesting, if only because of the Bayesian connection, and I wonder if we will soon see a probability numerics section at ISBA! More seriously, placing priors on functions or functionals is a highly formal perspective (as in Bayesian non-parametrics) and it makes me wonder how much of the data (evaluation of a function at a given set of points) and how much of the prior is reflected in the output [variability]. (Obviously, one could also ask a similar question for statistical analyses!)  For instance, issues of singularity arise among those stochastic process priors.

Another question that stemmed from this talk is whether or not more efficient numerical methods can derived that way, in addition to recovering the most classical ones. Somewhat, somehow, given the idealised nature of the prior, it feels like priors could be more easily compared or ranked than in classical statistical problems. Since the aim is to figure out the value of an integral or the solution to an ODE. (Or maybe not, since again almost the same could be said about estimating a normal mean.)


Filed under: pictures, Running, Statistics, Travel, University life Tagged: Bayesian statistics, Brownian motion, Coventry, CRiSM, Gaussian processes, numerical analysis, numerical integration, Persi Diaconis, probability theory, Runge-Kutta, stochastic processes, sunrise, trapezoidal approximation, University of Warwick, Warwickshire, workshop
Categories: Bayesian Bloggers

the forever war [book review]

Xian's Og - Sat, 2015-04-25 18:15

Another book I bought somewhat on a whim, although I cannot remember which one… The latest edition has a preface by John Scalzi, author of Old Man’s War and its sequels, where he acknowledged he would not have written this series, had he previously read The Forever War. Which strikes me as ironical as I found Scalzi’s novels way better. Deeper. And obviously not getting obsolete so immediately! (As an aside, Scalzi is returning to the Old Man’s War universe with a new novel, The End of All Things.)

“…it’s easy to compute your chances of being able to fight it out for ten years. It comes to about two one-thousandths of one percent. Or, to put it another way, get an old-fashioned six-shooter and play Russian Roulette with four of the six chambers loaded. If you can do it ten times in a row without decorating the opposite wall, congratulations! You’re a civilian.”

This may be the main issue with The Forever War. The fact that it sounds so antiquated. And hence makes reading the novel like an exercise in Creative Writing 101, in order to spot how the author was so rooted in the 1970’s that he could not project far enough in the future to make his novel sustainable. The main issue in the suspension of belief required to proceed through the book is the low-tech configuration of Halderman’s future. Even though intergalactic travel is possible via the traditional portals found in almost every sci’-fi’ book, computers are blatantly missing from the picture. And so is artificial intelligence as well. (2001 A space odyssey was made in 1968, right?!) The economics of a forever warring Earth are quite vague and unconvincing. There is no clever tactics in the war against the Taurans. Even the battle scenes are far from exciting. Esp. the parts where they fight with swords and arrows. And the treatment of sexuality has not aged well. So all that remains in favour of the story (and presumably made the success of the book) is the description of the ground soldier’s life which could almost transcribe verbatim to another war and another era. End of the story. (Unsurprisingly, while being the first book picked for the SF MasterworksThe Forever War did not make it into the 2011 series…)


Filed under: Books, Kids Tagged: Joe Haldeman, John Scalzi, science fiction, space opera, The End of All Things, Vietnam
Categories: Bayesian Bloggers

bruggen in Amsterdam

Xian's Og - Sat, 2015-04-25 08:18
Categories: Bayesian Bloggers

ontological argument

Xian's Og - Fri, 2015-04-24 18:15


Filed under: Books, Kids, pictures Tagged: atheism, ontological argument, xkcd
Categories: Bayesian Bloggers

[non] Markov chains

Xian's Og - Fri, 2015-04-24 08:18
Categories: Bayesian Bloggers

scale acceleration

Xian's Og - Thu, 2015-04-23 18:15

Kate Lee pointed me to a rather surprising inefficiency in matlab, exploited in Sylvia Früwirth-Schnatter’s bayesf package: running a gamma simulation by rgamma(n,a,b) takes longer and sometimes much longer than rgamma(n,a,1)/b, the latter taking advantage of the scale nature of b. I wanted to check on my own whether or not R faced the same difficulty, so I ran an experiment [while stuck in a Thalys train at Brussels, between Amsterdam and Paris…] Using different values for a [click on the graph] and a range of values of b. To no visible difference between both implementations, at least when using system.time for checking.

a=seq(.1,4,le=25) for (t in 1:25) a[t]=system.time( rgamma(10^7,.3,a[t]))[3] a=a/system.time(rgamma(10^7,.3,1))[3]

Once arrived home, I wondered about the relevance of the above comparison, since rgamma(10^7,.3,1) forces R to use 1 as a scale, which may differ from using rgamma(10^7,.3), where 1 is known to be the scale [does this sentence make sense?!]. So I rerun an even bigger experiment as

a=seq(.1,4,le=25) for (t in 1:25) a[t]=system.time( rgamma(10^8,.3,a[t]))[3] a=a/system.time(rgamma(10^8,.3))[3]

and got the graph below. Which is much more interesting because it shows that some values of a are leading to a loss of efficiency of 50%. Indeed. (The most extreme cases correspond to a=0.3, 1.1., 5.8. No clear pattern emerging.)Update

As pointed out by Martyn Plummer in his comment, the C function behind the R rgamma function and Gamma generator does take into account the scale nature of the second parameter, so the above time differences are not due to this function but rather to whatever my computer was running at the same time…! Apologies to anyone I scared with this void warning!


Filed under: pictures, R, Statistics, Travel, University life Tagged: bayesf, Brussels, Matlab, R, rgamma, rgamma.c, scale, scale parameter, system.time
Categories: Bayesian Bloggers