Bayesian News Feeds

efficient implementation of MCMC when using an unbiased likelihood estimator

Xian's Og - Tue, 2014-05-06 18:14

I read this paper by Arnaud Doucet, Mike Pitt, George Deligiannidis and Robert Kohn, re-arXived last month, when travelling to Warwick this morning. In a very pleasant weather, both sides of the Channel. (Little was I aware then that it was a public (“bank”) holiday in the UK and hence that the department here would be empty of people.) Actually, Mike had already talked with me about it during my previous visit to Warwick, as the proof in the paper is making use of our vanilla Rao-Blackwellisation paper, by considering the jump kernels associated with the original kernels.

The purpose of the paper is to determine the precision of (i.e., the number of terms N in) an unbiased estimation of the likelihood function in order to minimise the asymptotic variance of the corresponding Metropolis-Hastings estimate. For a given total number of simulations. While this is a very pertinent issue with pseudo-marginal and particle MCMC algorithms, I would overall deem the paper to be more theoretical than methodological in that it relies on special assumptions like a known parametric family for the distribution of the noise in the approximation of the log-likelihood and independence (of this distribution) from the parameter value. The central result of the paper is that the number of terms N should be such that the variance of the log-likelihood estimator is around 1. Definitely a manageable target. (The above assumptions are used to break the Metropolis-Hastings acceptance probability in two independent parts and to run two separate acceptance checks. Ending up with an upper bound on the asymptotic variance.)

Filed under: pictures, Statistics, Travel, University life Tagged: air pictures, bank holiday, Birmingham, England, University of Warwick
Categories: Bayesian Bloggers

AISTATS 2014 (tee-shirt)

Xian's Og - Tue, 2014-05-06 08:18

It took me a fairly long while to realise there was a map of Iceland as a tag-cloud at the back of the AISTATS 2014 tee-shirt! As it was far too large for me, I thought about leaving it at the conference desk last week. I did bring it back for someone the proper size though and discovered the above when unfolding the tee… Nice but still not my size!

Filed under: Kids, pictures, Statistics, Travel, University life Tagged: AISTATS 2014, ash cloud, Iceland, Reykjavik, tag cloud, tee-shirt
Categories: Bayesian Bloggers

Le Monde puzzle [#865]

Xian's Og - Mon, 2014-05-05 18:14

A Le Monde mathematical puzzle in combinatorics:

Given a permutation σ of {1,…,5}, if σ(1)=n, the n first values of σ are inverted. If the process is iterated until σ(1)=1, does this always happen and if so what is the maximal  number of iterations? Solve the same question for the set {1,…,2014}.

I ran the following basic R code:

N=5 tm=0 for (i in 1:10^6){ sig=sample(1:N) #random permutation t=0;while (sig[1]>1){ n=sig[1] sig[1:n]=sig[n:1] t=t+1} tm=max(t,tm)}

obtaining 7 as the outcome. Here is the evolution of the maximum as a function of the number of terms in the set. If we push the regression to N=2014, the predicted value is around 600,000… Running a million simulations of the above only gets me to 23,871!A wee minutes of reflection lead me to conjecture that the maximum number of steps wN should be satisfy wN=wN-1+N-2. However, the values resulting from the simulations do not grow as fast. (And, as Jean-Louis Fouley commented, it does not even work for N=6.) Monte Carlo effect or true discrepancy?

Filed under: Books, Kids, R Tagged: Le Monde, mathematical puzzle, R
Categories: Bayesian Bloggers

a pseudo-marginal perspective on the ABC algorithm

Xian's Og - Sun, 2014-05-04 18:14

My friends Luke Bornn, Natesh Pillai and Dawn Woodard just arXived along with Aaron Smith a short note on the convergence properties of ABC. When compared with acceptance-rejection or regular MCMC. Unsurprisingly, ABC does worse in both cases. What is central to this note is that ABC can be (re)interpreted as a pseudo-marginal method where the data comparison step acts like an unbiased estimator of the true ABC target (not of the original ABC target, mind!). From there, it is mostly an application of Christophe Andrieu’s and Matti Vihola’s results in this setup. The authors also argue that using a single pseudo-data simulation per parameter value is the optimal strategy (as compared with using several), when considering asymptotic variance. This makes sense in terms of simulating in a larger dimensional space but what of the cost of producing those pseudo-datasets against the cost of producing a new parameter? There are a few (rare) cases where the datasets are much cheaper to produce.

Filed under: Mountains, pictures, Statistics, University life Tagged: ABC, ABC-MCMC, acceptance rate, Alps, asymptotics, Chamonix, MCMSki IV, pseudo-data, ranking
Categories: Bayesian Bloggers

The Republic of Thieves [book review]

Xian's Og - Sat, 2014-05-03 18:14

At last! The third volume in Scott Lynch’s Gentlemen Bastards series has appeared!After several years of despairing ever seeing the sequel to The Lies of Locke Lamora and of Red Seas under Red Skies, The Republic of Thieves eventually appeared.  The author thus managed to get over his chronic depression to produce a book in par with the previous two volumes… Judging from the many reviews found on the Web, reception ranges from disappointed to ecstatic. I do think this volume is very good, if below the initial The Lies of Locke Lamora in terms of freshness and plot. There is consistency in terms of the series, some explanations are provided wrt earlier obscure points, new obscure points are created in preparation for the next volumes, and the main characters broaden and grow in depth and complexity. Mostly.

The book The Republic of Thieves is much more innovative than its predecessor from a purely literary viewpoint, with story told within story, with on top of this a constant feedback to the origins of the Gentlemen Bastards upper-scale thieves band. The inclusion of a real play which title is the same as the title of the book is a great idea, albeit not exactly new (from Cyrano de Bergerac to The Wheel of Time to The Name of the Wind), as it gives more coherence to the overall plot. The Gentlemen Bastards as depicted along those books are indeed primarily fabulous actors and they manage their heists mostly by clever acting, rather than force and violence. (Covers hence miss the point completely by using weapons and blood.) It thus makes sense that they had had training with an acting troop… Now, the weakest point in the book is the relationship between the two central characters, Locke Lamora and Sabetha Belacoros. This is rather unfortunate as there are a lot of moments and a lot of pages and a lot of dialogues centred on this relationship! Lynch seems unable to strike the right balance and Locke remains an awkward pre-teen whose apologies infuriate Sabetha at every corner… After the third occurence of this repeated duo, it gets quickly annoying. The couple only seems to grow up at the very end of the book. At last! Apart from this weakness, the plot is predictable at one level, which sounds like the primarily level… (spoiler?!) until a much deeper one is revealed, once again in the final pages of the book which, even more than in the previous ones, turn all perspectives upside-down and desperately beg for the next book to appear. Hopefully in less than six years…

Filed under: Books, Kids Tagged: depression, gentlemen bastards, Scott Lynch, the lies of Locke Lamora, the republic of thieves
Categories: Bayesian Bloggers

Reykjavik street art

Xian's Og - Fri, 2014-05-02 18:14
Categories: Bayesian Bloggers

RSS statistical analytics challenge 2014

Xian's Og - Thu, 2014-05-01 18:14

Great news! The RSS is setting a data analysis challenge this year, sponsored by the Young Statisticians Section and Research Section of the Royal Statistical Society: Details are available on the wordpress website of the Challenge. Registration is open and the Challenge goes live on Tuesday 6 May 2014 for an exciting 6 weeks competition. (A wee bit of an unfortunate timing for those of us considering submitting a paper to NIPS!) Truly terrific, I have been looking for this kind of event to happen for many years (without finding the momentum to set it rolling…)  and hope it will generate a lot of exciting activity and replicas in other societies.

Filed under: Kids, R, Statistics, University life, Wines Tagged: data challenge, NIPS, Royal Statistical Society, RSS, Wordpress
Categories: Bayesian Bloggers

art brut

Xian's Og - Wed, 2014-04-30 18:14

Filed under: Kids, pictures Tagged: design, Iceland, small peas, tin can
Categories: Bayesian Bloggers

controlled thermodynamic integral for Bayesian model comparison [reply]

Xian's Og - Tue, 2014-04-29 18:14

Chris Oates wrotes the following reply to my Icelandic comments on his paper with Theodore Papamarkou, and Mark Girolami, reply that is detailed enough to deserve a post on its own:

Thank you Christian for your discussion of our work on the Og, and also for your helpful thoughts in the early days of this project! It might be interesting to speculate on some aspects of this procedure:

(i) Quadrature error is present in all estimates of evidence that are based on thermodynamic integration. It remains unknown how to exactly compute the optimal (variance minimising) temperature ladder “on-the-fly”; indeed this may be impossible, since the optimum is defined via a boundary value problem rather than an initial value problem. Other proposals for approximating this optimum are compatible with control variates (e.g. Grosse et al, NIPS 2013, Friel and Wyse, 2014). In empirical experiments we have found that the second order quadrature rule proposed by Friel and Wyse 2014 leads to substantially reduced bias, regardless of the specific choice of ladder.

(ii) Our experiments considered first and second degree polynomials as ZV control variates. In fact, intuition specifically motivates the use of second degree polynomials: Let us presume a linear expansion of the log-likelihood in θ. Then the implied score function is constant, not depending on θ. The quadratic ZV control variates are, in effect, obtained by multiplying the score function by θ. Thus control variates can be chosen to perfectly correlate with the log-likelihood, leading to zero-variance estimators. Of course, there is an empirical question of whether higher-order polynomials are useful when this Taylor approximation is inappropriate, but they would require the estimation of many more coefficients and in practice may be less stable.

(iii) We require that the control variates are stored along the chain and that their sample covariance is computed after the MCMC has terminated. For the specific examples in the paper such additional computation is a negligible fraction of the total computational, so that we did not provide specific timings. When non-diffegeometric MCMC is used to obtain samples, or when the score is unavailable in closed-form and must be estimated, the computational cost of the procedure would necessarily increase.

For the wide class of statistical models with tractable likelihoods, employed in almost all areas of statistical application, the CTI we propose should provide state-of-the-art estimation performance with negligible increase in computational costs.

Filed under: Books, pictures, Running, Statistics, University life Tagged: advanced Monte Carlo methods, arXiv, control variate, Iceland, MCMC algorithms, Monte Carlo Statistical Methods, path sampling, Pima Indians, pMCMC, quadrature rule, Reykjavik, Riemann manifold, thermodynamic integration
Categories: Bayesian Bloggers

the ABC-SubSim algorithm

Xian's Og - Mon, 2014-04-28 18:14

In a nice coincidence with my ABC tutorial at AISTATS 2014 – MLSS, Manuel Chiachioa, James Beck, Juan Chiachioa, and Guillermo Rus arXived today a paper on a new ABC algorithm, called ABC-SubSim. The SubSim stands for subset simulation and corresponds to an approach developed by one of the authors for rare-event simulation. This approach looks somewhat similar to the cross-entropy method of Rubinstein and Kroese, in that successive tail sets are created towards reaching a very low probability tail set. Simulating from the current subset increases the probability to reach the following and less probable tail set. The extension to the ABC setting is done by looking at the acceptance region (in the augmented space) as a tail set and by defining a sequence of tolerances.  The paper could also be connected with nested sampling in that constrained simulation through MCMC occurs there as well. Following the earlier paper, the MCMC implementation therein is a random-walk-within-Gibbs algorithm. This is somewhat the central point in that the sample from the previous tolerance level is used to start a Markov chain aiming at the next tolerance level. (Del Moral, Doucet and Jasra use instead a particle filter, which could easily be adapted to the modified Metropolis move considered in the paper.) The core difficulty with this approach, not covered in the paper, is that the MCMC chains used to produce samples from the constrained sets have to be stopped at some point, esp. since the authors run those chains in parallel. The stopping rule is not provided (see, e.g., Algorithm 3) but its impact on the resulting estimate of the tail probability could be far from negligible… Esp. because there is no burnin/warmup. (I cannot see how “ABC-SubSim exhibits the benefits of perfect sampling” as claimed by the authors, p. 6!)  The authors re-examined the MA(2) toy benchmark we had used in our earlier survey, reproducing as well the graphical representation on the simplex as shown above.

Filed under: pictures, Statistics Tagged: ABC, ABC-SubSim, AISTATS 2014, cross-entropy method, MLSS, nested sampling, tails
Categories: Bayesian Bloggers

faculty positions in statistics at ENSAE, Paris

Xian's Og - Mon, 2014-04-28 08:18

Here is a call from ENSAE about two positions in statistics/machine learning, starting next semester:

ENSAE ParisTech and CREST is currently inviting applications for one position at the level associate or full professor from outstanding candidates having demonstrated abilities in both research and teaching. We are interested in candidates with a Ph.D. in Statistics or Machine Learning (or related field) whose research interests are in high dimensional statistical inference, learning theory or statistics of networks.

The appointment could begin as soon as September 1, 2014. The position is for an initial three-year term, with a possible renewal option in case of positive evaluation of research and teaching activities. Salary for suitably qualified applicants is competitive and commensurate with experience. The deadline for application is May 19, 2014.  Full details are given here for the first position and there for the second position.

Filed under: Statistics, University life Tagged: assistant professor position, CREST, ENSAE, machine learning, Malakoff, Paris, Statistics
Categories: Bayesian Bloggers

AISTATS 2014 [day #3]

Xian's Og - Sun, 2014-04-27 18:14

The third day at AISTATS 2014 started with Michael Jordan giving his plenary lecture, or rather three short talks on “Big Data” privacy, communication risk, and (bag of) bootstrap. I had not previously heard Michael talking about the first two topics and further found interesting the attempt to put computation into the picture (a favourite notion of Michael’s), however I was a bit surprised at the choice of a minimax criterion. Indeed, getting away from the minimax criterion was one of the major reasons I move to the B side of the Force. Because it puts exactly the same importance on every single value of the parameter. Even the most impossible ones. I was also a wee bit surprised at the optimal solution produced by this criterion: in a multivariate binary data setting (e.g., multiple drugs usage), the optimal privacy solution was to create a random binary vector and pick at random between this vector and its complement, depending on which one is closest to the observable. The loss of information seems formidable if the dimension of the vector is large. (Implementing ABC as a privacy [privacizing?] strategy would sound better if less optimal…) The next session was about deep learning, of which I knew [and know nothing], but the talk by Yoshua Bengio raised very relevant questions, like how to learn where the main part of the mass of a probability distribution is, besides pointing at a recent survey of his’. The survey points at some notions that I master and some that I don’t, but a cursory reading does not lead me to put an intuitive meaning on deep learning.

The last session of the day and of the conference was on more statistical issues, like a Gaussian process modelling of a spatio-temporal dataset on Afghanistan attacks by Guido Sanguinetti, the use of Rao-Blackwellisation and control variate to build black-box variational inference by Rajesh Ranganath, the construction of  conditional exponential families on mixed graphs by Pradeep Ravikumar, and a presentation of probabilistic programming with Anglican by Frank Wood that I had already seen in Banff. In particular, I found the result on the existence of joint exponential families on graphs when defined by those full conditionals quite exciting!

The second poster session was in the early evening, with many more posters (and plenty of food and drinks!), as it also included the (non-refereed) MLSS posters. Among the many interesting ones I spotted, a way to hit-and-run for quasi-concave densities, estimating mixtures with negative weights, a failing particle algorithm for a flu epidemics, an exact EP algorithm, and a fairly intense discussion around Richard Wilkinson’s poster on Gaussian process ABC algorithm (that I discussed on the ‘Og a while ago).

Filed under: Mountains, pictures, Statistics, Travel, University life Tagged: ABC, AISTATS 2014, Anglican, big data, deep learning, exponential families, Garðskaga, Gaussian processes, Iceland, machine learning, probabilistic programming, Reykjanes Peninsula, Reykjavik
Categories: Bayesian Bloggers

Le Monde puzzle [#869]

Xian's Og - Sat, 2014-04-26 18:14

A Le Monde mathematical puzzle once again in a Sudoku mode:

In an nxn table, all integers between 1 and n appear n times. If max denotes the maximum over the numbers of different integers on all rows and columns,  what is the minimum value of max when n=7? when n=11?

I tried to solve it by the following R code (in a pre-breakfast mode in my Reykjavik Airbnb flat!):

#pseudoku n=7 T=10^4 vals=rep(1:n,n) minmax=n for (t in 1:T){ psudo=matrix(sample(vals),ncol=n) maxc=maxr=max(sapply(apply(psudo,1,unique),length)) if (maxc<minmax) maxr=max(sapply(apply(psudo,2,unique),length)) minmax=min(minmax,max(maxc,maxr)) }

but later realised that (a) the


failed when all rows or all columns had the same number of unique terms and (b) I did not have to run the whole matrix:

vals=rep(1:n,n) minmax=n for (t in 1:T){ psudo=matrix(sample(vals),ncol=n) maxc=max(length(unique(psudo[1,])),length(unique(psudo[,1]))) i=1 while((i<n)&(maxc<minmax)){ i=i+1 maxc=max(maxc, length(unique(psudo[i,])), length(unique(psudo[,i])))} minmax=min(minmax,maxc) }

gaining a factor of 3 in the R execution. With this random exploration, the minimum value returned was 2,2,2,3,4,5,5,6,7,8 for n=2,3,4,5,6,7,8,9,10,11. Half-hearted simulating annealing during one of the sessions of AISTATS 2014 did not show any difference…

Filed under: Books, Kids, R, Statistics, University life Tagged: AISTATS 2014, Le Monde, mathematical puzzle, R, Reykjavik, sapply(), sudoku, unique()
Categories: Bayesian Bloggers

AISTATS 2014 / MLSS tutorial

Xian's Og - Fri, 2014-04-25 18:14

Here are the slides of the tutorial on ABC methods I gave yesterday at both AISTAST 2014 and MLSS. (I actually gave a tutorial at another MLSS a few years ago, on the pretty island of Berder in Brittany, next to Vannes.) They are definitely similar to previous talks and tutorials I delivered on this topic of ABC algorithms, with only the last part being original (if unpublished yet). And even then: as Michael Gutmann from the University of Helsinki pointed out to me at the end of my talk, there are similarities between the classification method he exposed at MCMSki 4 in Chamonix and our use of random forests. Before my talk, I attended the tutorial of Roderick Murray-Smith from the University of Glasgow, on Machine learning and Human Computer Interaction, which was just stunning in its breadth, range of applications, and mastering of multimedia tools. Making me feel like a perfectly inadequate follower…

Filed under: Mountains, R, Statistics, University life Tagged: ABC, ABC model choice, AISTATS 2014, Berder, Brittany, Iceland, machine learning, MLSS, Reykjavik, summary statistics, Vannes
Categories: Bayesian Bloggers