Xian's Og

Syndicate content Xi'an's Og
an attempt at bloggin, nothing more...
Updated: 2 weeks 6 days ago

approximate maximum likelihood estimation using data-cloning ABC

Mon, 2015-06-01 18:15

“By accepting of having obtained a poor approximation to the posterior, except for the location of its main mode, we switch to maximum likelihood estimation.”

Presumably the first paper ever quoting from the ‘Og! Indeed, Umberto Picchini arXived a paper about a technique merging ABC with prior feedback (rechristened data cloning by S. Lele), where a maximum likelihood estimate is produced by an ABC-MCMC algorithm. For state-space models. This relates to an earlier paper by Fabio Rubio and Adam Johansen (Warwick), who also suggested using ABC to approximate the maximum likelihood estimate. Here, the idea is to use an increasing number of replicates of the latent variables, as in our SAME algorithm, to spike the posterior around the maximum of the (observed) likelihood. An ABC version of this posterior returns a mean value as an approximate maximum likelihood estimate.

“This is a so-called “likelihood-free” approach [Sisson and Fan, 2011], meaning that knowledge of the complete expression for the likelihood function is not required.”

The above remark is sort of inappropriate in that it applies to a non-ABC setting where the latent variables are simulated from the exact marginal distributions, that is, unconditional on the data, and hence their density cancels in the Metropolis-Hastings ratio. This pre-dates ABC by a few years, since this was an early version of particle filter.

“In this work we are explicitly avoiding the most typical usage of ABC, where the posterior is conditional on summary statistics of data S(y), rather than y.”

Another point I find rather negative in that, for state-space models, using the entire time-series as a “summary statistic” is unlikely to produce a good approximation.

The discussion on the respective choices of the ABC tolerance δ and on the prior feedback number of copies K is quite interesting, in that Umberto Picchini suggests setting δ first before increasing the number of copies. However, since the posterior gets more and more peaked as K increases, the consequences on the acceptance rate of the related ABC algorithm are unclear. Another interesting feature is that the underlying MCMC proposal on the parameter θ is an independent proposal, tuned during the warm-up stage of the algorithm. Since the tuning is repeated at each temperature, there are some loose ends as to whether or not it is a genuine Markov chain method. The same question arises when considering that additional past replicas need to be simulated when K increases. (Although they can be considered as virtual components of a vector made of an infinite number of replicas, to be used when needed.)

The simulation study involves a regular regression with 101 observations, a stochastic Gompertz model studied by Sophie Donnet, Jean-Louis Foulley, and Adeline Samson in 2010. With 12 points. And a simple Markov model. Again with 12 points. While the ABC-DC solutions are close enough to the true MLEs whenever available, a comparison with the cheaper ABC Bayes estimates would have been of interest as well.


Filed under: Books, Statistics, University life Tagged: ABC, auxiliary particle filter, data cloning, likelihood-free, Metropolis-Hastings algorithm, prior feedback, SAME algorithm, summary statistics, University of Warwick
Categories: Bayesian Bloggers

O’Bayes15: my tutorial

Sun, 2015-05-31 18:15

Here are the slides I made for a short tutorial I will deliver this afternoon for the opening day of the International Workshop on Objective Bayes Methodology, O-Bayes15, held in the city of Valencià, so intricately linked with Bayesians and Bayesianism. The more so as we celebrating this time the career and life of our dear friend Susie. Celebrating with talks and stories, morning runs and afternoon drinks, laughs, tears, and more laughs, even though they cannot equate Susie’s unique and vibrant communicative laugh. I will remember how, at O’Bayes 13, Susie was the one who delivered this tutorial. And how, despite physical frailty and fatigue, she did with her usual energy and mental strength. And obviously again with laugh. I will also remember that the last time I visited Valencià, it was for Anabel Forte’s thesis defence, upon invitation from Susie, and that we had a terrific time, from discussing objective Bayes ideas to eating and drinking local goodies, to walking around the grandiose monuments just built (which presumably contributed to ruin the City of Valencià for quite a while!)


Filed under: Books, Kids, pictures, Running Tagged: Bayesian Analyis, Bayesian tests of hypotheses, non-informative priors, Spain, subjective versus objective Bayes, Susie Bayarri, València
Categories: Bayesian Bloggers

art brut

Sat, 2015-05-30 18:15
Categories: Bayesian Bloggers

Signori degli Anelli

Fri, 2015-05-29 18:15

Some rings and their stand from the walls and doors of the city of Siena.


Filed under: Kids, pictures, Travel, Wines Tagged: Italia, medieval architecture, Siena, Tuscany, wall ring
Categories: Bayesian Bloggers

Chianti

Fri, 2015-05-29 08:18
Categories: Bayesian Bloggers

discussions on Gerber and Chopin

Thu, 2015-05-28 18:15

As a coincidence, I received my copy of JRSS Series B with the Read Paper by Mathieu Gerber and Nicolas Chopin on sequential quasi Monte Carlo just as I was preparing an arXival of a few discussions on the paper! Among the [numerous and diverse] discussions, a few were of particular interest to me [I highlighted members of the University of Warwick and of Université Paris-Dauphine to suggest potential biases!]:

  1. Mike Pitt (Warwick), Murray Pollock et al.  (Warwick) and Finke et al. (Warwick) all suggested combining quasi Monte Carlo with pseudomarginal Metropolis-Hastings, pMCMC (Pitt) and Rao-Bklackwellisation (Finke et al.);
  2. Arnaud Doucet pointed out that John Skilling had used the Hilbert (ordering) curve in a 2004 paper;
  3. Chris Oates, Dan Simpson and Mark Girolami (Warwick) suggested combining quasi Monte Carlo with their functional control variate idea;
  4. Richard Everitt wondered about the dimension barrier of d=6 and about possible slice extensions;
  5. Zhijian He and Art Owen pointed out simple solutions to handle a random number of uniforms (for simulating each step in sequential Monte Carlo), namely to start with quasi Monte Carlo and end up with regular Monte Carlo, in an hybrid manner;
  6. Hans Künsch points out the connection with systematic resampling à la Carpenter, Clifford and Fearnhead (1999) and wonders about separating the impact of quasi Monte Carlo between resampling and propagating [which vaguely links to one of my comments];
  7. Pierre L’Ecuyer points out a possible improvement over the Hilbert curve by a preliminary sorting;
  8. Frederik Lindsten and Sumeet Singh propose using ABC to extend the backward smoother to intractable cases [but still with a fixed number of uniforms to use at each step], as well as Mateu and Ryder (Paris-Dauphine) for a more general class of intractable models;
  9. Omiros Papaspiliopoulos wonders at the possibility of a quasi Markov chain with “low discrepancy paths”;
  10. Daniel Rudolf suggest linking the error rate of sequential quasi Monte Carlo with the bounds of Vapnik and Ĉervonenkis (1977).

 The arXiv document also includes the discussions by Julyan Arbel and Igor Prünster (Turino) on the Bayesian nonparametric side of sqMC and by Robin Ryder (Dauphine) on the potential of sqMC for ABC.


Filed under: Books, Kids, Statistics, University life Tagged: ABC, discussion paper, doubly intractable problems, Hilbert, Igor Prünster, Julyan Arbel, Mathieu Gerber, Nicolas Chopin, quasi-Monte Carlo methods, Read paper, Royal Statistical Society, Series B, systematic resampling, Turino, University of Warwick, Vapnik-Chervonenkis
Categories: Bayesian Bloggers

Toscana [#3]

Thu, 2015-05-28 13:18
Categories: Bayesian Bloggers

simulating correlated random variables [cont’ed]

Wed, 2015-05-27 18:15

Following a recent post on the topic, and comments ‘Og’s readers kindly provided on that post, the picture is not as clear as I wished it was… Indeed, on the one hand, non-parametric measures of correlation based on ranks are, as pointed out by Clara Grazian and others, invariant under monotonic transforms and hence producing a Gaussian pair or a Uniform pair with the intended rank correlation is sufficient to return a correlated sample for any pair of marginal distributions by the (monotonic) inverse cdf transform.  On the other hand, if correlation is understood as Pearson linear correlation, (a) it is not always defined and (b) there does not seem to be a generic approach to simulate from an arbitrary triplet (F,G,ρ) [assuming the three entries are compatible]. When Kees pointed out Pascal van Kooten‘s solution by permutation, I thought this was a terrific resolution, but after thinking about it a wee bit more, I am afraid it is only an approximation, i.e., a way to return a bivariate sample with a given empirical correlation. Not the theoretical correlation. Obviously, when the sample is very large, this comes as a good approximation. But when facing a request to simulate a single pair (X,Y), this gets inefficient [and still approximate].

Now, if we aim at exact simulation from a bivariate distribution with the arbitrary triplet (F,G,ρ), why can’t we find a generic method?! I think one fundamental if obvious reason is that the question is just ill-posed. Indeed, there are many ways of defining a joint distribution with marginals F and G and with (linear) correlation ρ. One for each copula. The joint could thus be associated with a Gaussian copula, i.e., (X,Y)=(F⁻¹(Φ(A)),G⁻¹(Φ(B))) when (A,B) is a standardised bivariate normal with the proper correlation ρ’. Or it can be associated with the Archimedian copula

C(u; v) = (u-θ + v-θ − 1)-1/θ,

with θ>0 defined by a (linear) correlation of ρ. Or yet with any other copula… Were the joint distribution perfectly well-defined, it would then mean that ρ’ or θ (or whatever natural parameter is used for that copula) do perfectly parametrise this distribution instead of the correlation coefficient ρ. All that remains then is to simulate directly from the copula, maybe a theme for a future post…


Filed under: Books, Kids, Statistics Tagged: copulas, correlation, Monte Carlo methods, Monte Carlo Statistical Methods, simulating copulas
Categories: Bayesian Bloggers

the Flatland paradox [#2]

Tue, 2015-05-26 18:15

Another trip in the métro today (to work with Pierre Jacob and Lawrence Murray in a Paris Anticafé!, as the University was closed) led me to infer—warning!, this is not the exact distribution!—the distribution of x, namely

since a path x of length l(x) will corresponds to N draws if N-l(x) is an even integer 2p and p undistinguishable annihilations in 4 possible directions have to be distributed over l(x)+1 possible locations, with Feller’s number of distinguishable distributions as a result. With a prior π(N)=1/N on N, hence on p, the posterior on p is given by

Now, given N and  x, the probability of no annihilation on the last round is 1 when l(x)=N and in general

which can be integrated against the posterior. The numerical expectation is represented for a range of values of l(x) in the above graph. Interestingly, the posterior probability is constant for l(x) large  and equal to 0.8125 under a flat prior over N.

Getting back to Pierre Druilhet’s approach, he sets a flat prior on the length of the path θ and from there derives that the probability of annihilation is about 3/4. However, “the uniform prior on the paths of lengths lower or equal to M” used for this derivation which gives a probability of length l proportional to 3l is quite different from the distribution of l(θ) given a number of draws N. Which as shown above looks much more like a Binomial B(N,1/2).

However, being not quite certain about the reasoning involving Fieller’s trick, I ran an ABC experiment under a flat prior restricted to (l(x),4l(x)) and got the above, where the histogram is for a posterior sample associated with l(x)=195 and the gold curve is the potential posterior. Since ABC is exact in this case (i.e., I only picked N’s for which l(x)=195), ABC is not to blame for the discrepancy! I asked about the distribution on Stack Exchange maths forum (and a few colleagues here as well) but got no reply so far… Here is the R code that goes with the ABC implementation:

#observation: elo=195 #ABC version T=1e6 el=rep(NA,T) N=sample(elo:(4*elo),T,rep=TRUE) for (t in 1:T){ #generate a path paz=sample(c(-(1:2),1:2),N[t],rep=TRUE) #eliminate U-turns uturn=paz[-N[t]]==-paz[-1] while (sum(uturn>0)){ uturn[-1]=uturn[-1]*(1- uturn[-(length(paz)-1)]) uturn=c((1:(length(paz)-1))[uturn==1], (2:length(paz))[uturn==1]) paz=paz[-uturn] uturn=paz[-length(paz)]==-paz[-1] } el[t]=length(paz)} #subsample to get exact posterior poster=N[abs(el-elo)==0]
Filed under: Books, Kids, R, Statistics, University life Tagged: ABC, combinatorics, exact ABC, Flatland, improper priors, Larry Wasserman, marginalisation paradoxes, paradox, Pierre Druilhet, random walk, subjective versus objective Bayes, William Feller
Categories: Bayesian Bloggers

pigeons

Tue, 2015-05-26 08:18


Filed under: pictures, Travel Tagged: Italia, pigeon, Siena, Tuscany
Categories: Bayesian Bloggers

non-reversible MCMC [comments]

Mon, 2015-05-25 18:15

[Here are comments made by Matt Graham that I thought would be more readable in a post format. The beautiful picture of the Alps above is his as well. I do not truly understand what Matt’s point is, as I did not cover continuous time processes in my discussion…]

In terms of interpretation of the diffusion with non-reversible drift component, I think this can be generalised from the Gaussian invariant density case by

dx = [ – (∂E/∂x) dt + √2 dw ] + S’ (∂E/∂x) dt

where ∂E/∂x is the usual gradient of the negative log (unnormalised) density / energy and S=-S’ is a skew symmetric matrix. In this form it seems the dynamic can be interpreted as the composition of an energy and volume conserving (non-canonical) Hamiltonian dynamic

dx/dt = S’ ∂E/∂x

and a (non-preconditioned) Langevin diffusion

dx = – (∂E/∂x) dt + √2 dw

As an alternative to discretising the combined dynamic, it might be interesting to compare to sequential alternation between ‘Hamiltonian’ steps either using a simple Euler discretisation

x’ = x + h S’ ∂E/∂x

or a symplectic method like implicit midpoint to maintain reversibility / volume preservation and then a standard MALA step

x’ = x – h (∂E/∂x) + √2 h w, w ~ N(0, I)

plus MH accept. If only one final MH accept step is done this overall dynamic will be reversible, however if a an intermediary MH accept was done after each Hamiltonian step (flipping the sign / transposing S on a rejection to maintain reversibility), the composed dynamic would in general be non-longer reversible and it would be interesting to compare performance in this case to that using a non-reversible MH acceptance on the combined dynamic (this alternative sidestepping the issues with finding an appropriate scale ε to maintain the non-negativity condition on the sum of the vorticity density and joint density on a proposed and current state).

With regards to your point on the positivity of g(x,y)+π(y)q(y,x), I’m not sure if I have fully understood your notation correctly or not, but I think you may have misread the definition of g(x,y) for the discretised Ornstein-Uhlenbeck case (apologies if instead the misinterpretation is mine!). The vorticity density is defined as the skew symmetric component of the density f of F(dx, dy) = µ(dx) Q(x, dy) with respect to the Lebesgue measure, where µ(dx) is the true invariant distribution of the Euler-Maruyama discretised diffusion based proposal density Q(x, dy) rather than g(x, y) being defined in terms of the skew-symmetric component of π(dx) Q(x, dy) which in general would lead to a vorticity density which does not meet the zero integral requirement as the target density π is not invariant in general with respect to Q.


Filed under: Books, Mountains, pictures, Statistics, University life Tagged: Euler discretisation, Hamiltonian, Langevin diffusion, non-reversible diffusion, Ornstein-Uhlenbeck process, reversibility
Categories: Bayesian Bloggers

another viral math puzzle

Sun, 2015-05-24 18:15

After the Singapore Maths Olympiad birthday problem that went viral, here is a Vietnamese primary school puzzle that made the frontline in The Guardian. The question is: Fill the empty slots with all integers from 1 to 9 for the equality to hold. In other words, find a,b,c,d,e,f,g,h,i such that

a+13xb:c+d+12xef-11+gxh:i-10=66.

With presumably the operation ordering corresponding to

a+(13xb:c)+d+(12xe)f-11+(gxh:i)-10=66

although this is not specified in the question. Which amounts to

a+(13xb:c)+d+(12xe)f+(gxh:i)=87

and implies that c divides b and i divides gxh. Rather than pursing this analytical quest further, I resorted to R coding, checking by brute force whether or not a given sequence was working.

baoloc=function(ord=sample(1:9)){ if (ord[1]+(13*ord[2]/ord[3])+ord[4]+ 12*ord[5]-ord[6]-11+(ord[7]*ord[8]/ ord[9])-10==66) return(ord)}

I then applied this function to all permutations of {1,…,9} [with the help of the perm(combinat) R function] and found the 128 distinct solutions. Including some for which b:c is not an integer. (Not of this obviously gives a hint as to how a 8-year old could solve the puzzle.)

As pointed out in a comment below, using the test == on scalars is a bad idea—once realising some fractions may be other than integers—and I should thus replace the equality with an alternative that bypasses divisions,

baoloc=function(ord=sample(1:9)){ return(((ord[1]+ord[4]+12*ord[5]-ord[6]-87)* ord[3]*ord[9]+13*ord[2]*ord[9]+ ord[3]*ord[7]*ord[8]==0)*ord)}

leading to the overall R code

sol=NULL perms=as.matrix(data.frame(permutations(9)),ncol=9,byrow=TRUE) for (t in 1:factorial(9)){ a=baoloc(perms[t,]) if (a[1]>0) sol=rbind(sol,a)} sol=sol[do.call(order, as.data.frame(sol)),]

and returning the 136 different solutions…


Filed under: Books, Kids, R, University life Tagged: data.frame, mathematical puzzle, permutation, primary school, Singapore, The Guardian, The New York Times, Vietnam
Categories: Bayesian Bloggers

an even more senseless taxi-ride

Sun, 2015-05-24 13:32

I was (exceptionally) working in (and for) my garden when my daughter shouted down from her window that John Nash had just died. I thus completed my tree trimming and went to check about this sad item of news. What I read made the news even sadder as he and his wife had died in a taxi crash in New Jersey, apparently for not wearing seat-belts, a strategy you would think far from minimax… Since Nash was in Norway a few days earlier to receive the 2015 Abel Prize, it may even be that the couple was on its way home back from the airport.  A senseless death for a Beautiful Mind.


Filed under: Books, Kids, pictures, Travel, University life Tagged: A Beautiful Mind, Abel Prize, car accident, John Nash, New Jersey, Nobel Prize, Norway, Princeton University, seat-belt, taxi
Categories: Bayesian Bloggers

39% anglo-irish!

Sat, 2015-05-23 18:23

As I have always been curious about my ancestry, I made a DNA test on 23andMe. While the company no longer provides statistics about potential medical conditions because of a lawsuit, it does return an ancestry analysis of sorts. In my case, my major ancestry composition is Anglo-Irish!  (with 39% of my DNA) and northern European (with 32%), while only 19% is Franco-German… In retrospect, not so much of a surprise—not because of my well-known Anglophilia but—given that my (known, i.e., at least for the direct ancestral branches) family roots are in Normandy—whose duke invaded Britain in 1056—and Brittany—which was invaded by British Celts fleeing Anglo-Saxons in the 400’s.  What’s maybe more surprising to me is that the database contained 23 people identified as 4th degree cousins and a total of 652 relatives… While the potential number of my potential 4th degree cousins stands in the 10,000’s, and hence there may indeed be a few ending up as 23andMe—mostly American—customers, I am indeed surprised that a .37% coincidence in our genes qualifies for being 4th degree cousins! But given that I only share 3.1% with my great⁴-grandfather, it actually make sense that I share about .1% to .4% with such remote cousins. However I wonder at the precision of such an allocation: could those cousins be even more remotely related? Not related at all? [Warning: All the links to 23andMe in this post are part of their referral program.]

 


Filed under: Kids, Statistics, Travel Tagged: 23andMe, Britons, Brittany, Celts, common ancestor, DNA, genealogy, Normandy, sequencing
Categories: Bayesian Bloggers

a senseless taxi-ride

Fri, 2015-05-22 18:15

This morning, on my way to the airport (and to Montpellier for a seminar), Rock, my favourite taxi-driver, told me of a strange ride he endured the night before, so strange that he had not yet fully got over it! As it happened, he had picked an elderly lady with two large bags in the vicinity after a radio-call and drove her to a sort of catholic hostel in down-town Paris, near La Santé jail, a pastoral place housing visiting nuns and priests. However, when they arrived there, she asked the taxi to wait before leaving, quite appropriately as she had apparently failed to book the place. She then asked my friend to take her to another specific address, an hotel located nearby at Denfert-Rochereau. While Rock was waiting and the taxi counter running, the passenger literally checked in by visiting the hotel room and deciding she did not like it so she gave my taxi yet another hotel address near Saint-Honoré where she repeated the same process, namely visited the hotel room with the same outcome that she did not like the place. My friend was then getting worried about the meaning of this processionary trip all over Paris, the more because the lady did not have a particularly coherent discourse. And could not stop talking. The passenger then made him stop for food and drink, and, while getting back in the taxi, ordered him to drive her back to her starting place. After two hours and half, they thus came back to the place, with a total bill of 113 euros. The lady then handled a 100 euro bill to the taxi-driver, declaring she did not have any further money and that he should have brought her home directly from the first place they had stopped… In my friend’s experience, this was the weirdest passenger he ever carried and he thought the true point of the ride was to escape solitude and loneliness for one evening, even if chatting about non-sense the whole time.


Filed under: pictures, Travel Tagged: Montpellier, Paris, story, taxi, taxi-driver
Categories: Bayesian Bloggers

postdoc in the Alps

Fri, 2015-05-22 08:18

Post-doctoral Position in Spatial/Computational Statistics (Grenoble, France)

A post-doctoral position is available in Grenoble, France, to work on computational methods for spatial point process models. The candidate will work with Simon Barthelmé (GIPSA-lab, CNRS) and Jean-François Coeurjolly (Univ. Grenoble Alpes, Laboratory Jean Kuntzmann) on extending point process methodology to deal with large datasets involving multiple sources of variation. We will focus on eye movement data, a new and exciting application area for spatial statistics. The work will take place in the context of an interdisciplinary project on eye movement modelling involving psychologists, statisticians and applied mathematicians from three different institutes in Grenoble.

The ideal candidate has a background in spatial or computational statistics or machine learning. Knowledge of R (and in particular the package spatstat) and previous experience with point process models is a definite plus.

The duration of the contract is 12+6 months, starting 01.10.2015 at the earliest. Salary is according to standard CNRS scale (roughly EUR 2k/month).

Grenoble is the largest city in the French Alps, with a very strong science and technology cluster. It is a pleasant place to live, in an exceptional mountain environment.


Filed under: Kids, Mountains, Statistics, Travel, University life Tagged: Alps, CNRS, computational statistics, Grenoble, IMAG, Mount Lady Macdonald, mountains, point processes, postdoctoral position, spatial statistics
Categories: Bayesian Bloggers

Bruce Lindsay (March 7, 1947 — May 5, 2015)

Thu, 2015-05-21 18:15

When early registering for Seattle (JSM 2015) today, I discovered on the ASA webpage the very sad news that Bruce Lindsay had passed away on May 5.  While Bruce was not a very close friend, we had met and interacted enough times for me to feel quite strongly about his most untimely death. Bruce was indeed “Mister mixtures” in many ways and I have always admired the unusual and innovative ways he had found for analysing mixtures. Including algebraic ones through the rank of associated matrices. Which is why I first met him—besides a few words at the 1989 Gertrude Cox (first) scholarship race in Washington DC—at the workshop I organised with Gilles Celeux and Mike West in Aussois, French Alps, in 1995. After this meeting, we met twice in Edinburgh at ICMS workshops on mixtures, organised with Mike Titterington. I remember sitting next to Bruce at one workshop dinner (at Blonde) and him talking about his childhood in Oregon and his father being a journalist and how this induced him to become an academic. He also contributed a chapter on estimating the number of components [of a mixture] to the Wiley book we edited out of this workshop. Obviously, his work extended beyond mixtures to a general neo-Fisherian theory of likelihood inference. (Bruce was certainly not a Bayesian!) Last time, I met him, it was in Italia, at a likelihood workshop in Venezia, October 2012, mixing Bayesian nonparametrics, intractable likelihoods, and pseudo-likelihoods. He gave a survey talk about composite likelihood, telling me about his extended stay in Italy (Padua?) around that time… So, Bruce, I hope you are now running great marathons in a place so full of mixtures that you can always keep ahead of the pack! Fare well!

 


Filed under: Books, Running, Statistics, Travel, University life Tagged: American statisticians, Bruce Lindsay, composite likelihood, Edinburgh, ICMS, Italia, marathon, mixture estimation, mixtures of distributions, Penn State University, unknown number of components, Venezia
Categories: Bayesian Bloggers

da San Geminiano

Thu, 2015-05-21 13:18
Categories: Bayesian Bloggers

non-reversible MCMC

Wed, 2015-05-20 18:15

While visiting Dauphine, Natesh Pillai and Aaron Smith pointed out this interesting paper of Joris Bierkens (Warwick) that had escaped my arXiv watch/monitoring. The paper is about turning Metropolis-Hastings algorithms into non-reversible versions, towards improving mixing.

In a discrete setting, a way to produce a non-reversible move is to mix the proposal kernel Q with its time-reversed version Q’ and use an acceptance probability of the form

where ε is any weight. This construction is generalised in the paper to any vorticity (skew-symmetric with zero sum rows) matrix Γ, with the acceptance probability

where ε is small enough to ensure all numerator values are non-negative. This is a rather annoying assumption in that, except for the special case derived from the time-reversed kernel, it has to be checked over all pairs (x,y). (I first thought it also implied the normalising constant of π but everything can be set in terms of the unormalised version of π, Γ or ε included.) The paper establishes that the new acceptance probability preserves π as its stationary distribution. An alternative construction is to make the proposal change from Q in H such that H(x,y)=Q(x,y)+εΓ(x,y)/π(x). Which seems more pertinent as not changing the proposal cannot improve that much the mixing behaviour of the chain. Still, the move to the non-reversible versions has the noticeable plus of decreasing the asymptotic variance of the Monte Carlo estimate for any integrable function. Any. (Those results are found in the physics literature of the 2000’s.)

The extension to the continuous case is a wee bit more delicate. One needs to find an anti-symmetric vortex function g with zero integral [equivalent to the row sums being zero] such that g(x,y)+π(y)q(y,x)>0 and with same support as π(x)q(x,y) so that the acceptance probability of g(x,y)+π(y)q(y,x)/π(x)q(x,y) leads to π being the stationary distribution. Once again g(x,y)=ε(π(y)q(y,x)-π(x)q(x,y)) is a natural candidate but it is unclear to me why it should work. As the paper only contains one illustration for the discretised Ornstein-Uhlenbeck model, with the above choice of g for a small enough ε (a point I fail to understand since any ε<1 should provide a positive g(x,y)+π(y)q(y,x)), it is also unclear to me that this modification (i) is widely applicable and (ii) is relevant for genuine MCMC settings.


Filed under: Books, Statistics, University life Tagged: arXiv, MCMC algorithms, Monte Carlo Statistical Methods, Ornstein-Uhlenbeck model, reversibility, Université Paris Dauphine, University of Warwick
Categories: Bayesian Bloggers