## Bayesian Bloggers

### 6th French Econometrics Conference in Dauphine

On December 4-5, Université Paris-Dauphine will host the 6th French Econometric Conference, which celebrates Christian Gouriéroux and his contributions to econometrics. (Christian was my statistics professor during my graduate years at ENSAE and then Head of CREST when I joined this research unit, first as a PhD student and later as Head of the statistics group. And he has always been a tremendous support for me.)

Not only is the program quite impressive, with co-authors of Christian Gouriéroux and a few Nobel laureates (if not the latest, Jean Tirole, who taught economics at ENSAE when I was a student there), but registration is free. I will most definitely attend the talks, as I am in Paris-Dauphine at this time of year (the week before NIPS). In particular, looking forward to Gallant’s views on Bayesian statistics.

Filed under: Books, Kids, pictures, Statistics, University life Tagged: Bayesian econometrics, Christian Gouriéroux, CREST, econometrics, ENSAE, Jean Tirole, Nobel Prize, Université Paris Dauphine

### my ISBA tee-shirt designs

**H**ere are my tee-shirt design proposals for the official ISBA tee-shirt competition! (I used the facilities of CustomInk.com as I could not easily find a free software around. Except for the last one where I recycled my vistaprint mug design…)

While I do not have any expectation of seeing one of these the winner (!), what is your favourite one?!

Take Our Poll (function(d,c,j){if(!d.getElementById(j)){var pd=d.createElement(c),s;pd.id=j;pd.src='http://s1.wp.com/wp-content/mu-plugins/shortcodes/js/polldaddy-shortcode.js';s=d.getElementsByTagName(c)[0];s.parentNode.insertBefore(pd,s);} else if(typeof jQuery !=='undefined')jQuery(d.body).trigger('pd-script-load');}(document,'script','pd-polldaddy-loader'));Filed under: Books, Kids, pictures, Statistics, University life Tagged: Bayesian statistics, competition, CustomInk, ISBA, poll, tee-shirt, Thomas Bayes' portrait, werewolf

### Le Monde puzzle [#882]

**A** terrific Le Monde mathematical puzzle:

*All integers between 1 and n² are written in an (n,n) ** matrix under the constraint that two consecutive integers **are adjacent (i.e. 15 and 13 are two of the four neighbours of 14). What is the maximal value for the sum of the diagonal of this matrix?
*

Indeed, when considering a simulation resolution (for small values of m), it constitutes an example of self-avoiding random walk: when inserting the integers one by one at random, one produces a random walk over the (n,n) grid.

While the solution is trying to stick as much as possible to the diagonal vicinity for the descending sequence n²,n²-1, &tc., leaving space away from the diagonal for the terminal values, as in this example for n=5,

25 22 21 14 13 24 23 20 15 12 01 02 19 16 11 04 03 18 17 10 05 06 07 08 09simulating such a random walk is a bit challenging as the brute force solution does not work extremely well:

n=5 n2=n^2 init=function(){ board=invoard=matrix(0,n,n) set=rep(0,n2) start=val=n2 #sample(1:n2,1) neigh=board[start]=val invoard[val]=start set[val]=1 return(list(board=board,invoard=invoard, set=set,neigh=neigh)) } voisi=function(i){ a=arrayInd(i,c(n,n)) b=a[2];a=a[1] if ((a>1)&(a<n)){ voizin=(a+c(-1,1))+(b-1)*n}else{ if (a==1) voizin=a+1+(b-1)*n if (a==n) voizin=a-1+(b-1)*n} if ((b>1)&(b<=n)){ voizin=c(voizin,a+(b+c(-2,0))*n)}else{ if (b==1) voizin=c(voizin,a+b*n) if (b>n) voizin=c(voizin,a+(b-2)*n)} voizin=voizin[(voizin>0)&(voizin<=n2)] return(voizin) } a=init() board=a$board invoard=a$invoard set=a$set neigh=a$neigh while (sum(set)<n2){ if (length(neigh)==1) neighb=neigh+c(-1,1) if (length(neigh)==2) neighb=c(min(neigh)-1, max(neigh)+1) neighb=neighb[(neighb>0)&(neighb<=n2)] neighb=neighb[set[neighb]==0] for (i in 1:length(neighb)){ j=1 if (i==2) j=1+(length(neigh)>1) loc=voisi(invoard[neigh[j]]) loc=loc[board[loc]==0] if (length(loc)==0) break() #no solution if (length(loc)==1){ board[loc]=neighb[i] invoard[neighb[i]]=loc set[neighb[i]]=1} if (length(loc)>1){ #2 or more solutions val=sample(loc,1) board[val]=neighb[i] invoard[neighb[i]]=val set[neighb[i]]=1} } if (min(set[neighb])==0){#start afresco a=init() board=a$board invoard=a$invoard set=a$set neigh=a$neigh }else{ if (length(neighb)==1) neigh=neighb if (length(neighb)>1){ neigh=sort(neighb) neigh=neigh[(neigh>0)&(neigh<=n2)] }} }the reason being that the chain often has to restart a fresco, the more the larger n is… For n=5, I still recovered the optimal solution:

> while (sum(diag(board))<93) + source("lemonde882.R") [,1] [,2] [,3] [,4] [,5] [1,] 9 8 7 6 1 [2,] 10 17 18 5 2 [3,] 11 16 19 4 3 [4,] 12 15 20 23 24 [5,] 13 14 21 22 25but running the R code for n=7 towards finding the maximum (259?) takes quite a while and 50 proposals for n=8 took the whole night… If I run a simple log-log regression on the values obtained for n=2,…,7, the prediction for n=10 is 768. A non-stochastic prediction is 870.

As a last ditch attempt to recover the sequence n²-2k along the diagonal for k=0,1,…, I modified my code to favour simulations close to the diagonal at the start, as

if (length(loc)>1){#two or more solutions val=sample(loc,1, proba=exp(-abs(loc%%n-1-loc%/%n)/sum(set)))which produces higher diagonal values but also more rejections.

Filed under: Books, Kids, Statistics, University life Tagged: Le Monde, mathematical puzzle, self-avoiding random walk

### how far can we go with Minard’s map?!

Like many others, I discovered Minard’s map of the catastrophic 1812 Russian campaign of Napoleon in Tufte’s book. And I consider it a masterpiece for its elegant way of summarising some many levels of information about this doomed invasion of Russia. So when I spotted Menno-Jan Kraak’s Mapping Time, analysing the challenges of multidimensional cartography through this map and this Naepoleonic campaign, I decided to get a look at it.

Apart from the trivia about Kraak‘s familial connection with the Russian campaign and the Berezina crossing which killed one of his direct ancestors, his great-great-grandfather, along with a few dozen thousand others (even though this was not the most lethal part of the campaign), he brings different perspectives on the meaning of a map and the quantity of information one could or should display. This is not unlike other attempts at competiting with Minard, including those listed on Michael Friendly’s page. Incl. the cleaner printing above. And the dumb pie-chart… A lot more can be done in 2013 than in 1869, indeed, including the use of animated videos, but I remain somewhat sceptical as to the whole purpose of the book. It is a beautiful object, with wide margins and nice colour reproductions, for sure, alas… I just do not see the added value in Kraak‘s work. I would even go as far as thinking this is an a-statistical approach, namely that by trying to produce as much data as possible into the picture, he forgets the whole point of the drawing which is I think to show the awful death rate of the Grande Armée along this absurd trip to and from Moscow and the impact of temperature (although the rise that led to the thaw of the Berezina and the ensuing disaster does not seem correlated with the big gap at the crossing of the river). If more covariates were available, two further dimensions could be added: the proportions of deaths due to battle, guerilla, exhaustion, desertion, and the counterpart map of the Russian losses. In the end, when reading Mapping Time, I learned more about the history surrounding this ill-planned military campaign than about the proper display of data towards informative and unbiased graphs.

Filed under: Books, Linux, pictures, Statistics, Travel Tagged: Berezina, book review, Grande Armée, Kraak, map, Minard, Napoléon, Patriotic war, Russian campaign, Russian winter, Tufte

### poor graph of the day

### Wien graffitis

Filed under: Kids, pictures, Running, Travel Tagged: bagpipes, graffitis, Panza, Scotland, Skirl, Vienna

### impression, soleil couchant

**A**s the sunset the day before had been magnificent [thanks to the current air pollution!], I attempted to catch it from a nice spot and went to the top of the nearby hill. The particles were alas (!) not so numerous that evening and so I only got this standard view of the sun going down. Incidentally, I had read a few days earlier in the plane from Vienna that the time and date when Monet’s Impression, Soleil Levant was painted had been identified by a U.S. astronomer as 7h35 on the 13th of Novembre 1872. In Le Havre, Normandy. An exhibit in the nearby Musée Marmottan retraces this quest…

Filed under: pictures, Running Tagged: Bagneux, Claude Monet, Le Havre, Marmottan, Paris, sunrise, sunset

### Combining Particle MCMC with Rao-Blackwellized Monte Carlo Data Association

**T**his recently arXived paper by Juho Kokkala and Simo Särkkä mixes a whole lot of interesting topics, from particle MCMC and Rao-Blackwellisation to particle filters, Kalman filters, and even bear population estimation. The starting setup is the state-space hidden process models where particle filters are of use. And where Andrieu, Doucet and Hollenstein (2010) introduced their particle MCMC algorithms. Rao-Blackwellisation steps have been proposed in this setup in the original paper, as well as in the ensuing discussion, like recycling rejected parameters and associated particles. The beginning of the paper is a review of the literature in this area, in particular of the Rao*-Blackwellized Monte Carlo Data Association* algorithm developed by Särkkä et al. (2007), of which I was not aware previously. (I alas have not followed closely enough the filtering literature in the past years.) Targets evolve independently according to Gaussian dynamics.

In the description of the model (Section 3), I feel there are prerequisites on the model I did not have (and did not check in Särkkä et al., 2007), like the meaning of targets and measurements: it seems the model assumes each measurement corresponds to a given target. More details or an example would have helped. The extension against the existing appears to be the (major) step of including unknown parameters. Due to my lack of expertise in the domain, I have no notion of the existence of similar proposals in the literature, but handling unknown parameters is definitely of direct relevance for the statistical analysis of such problems!

The simulation experiment based on an Ornstein-Uhlenbeck model is somewhat anticlimactic in that the posterior on the mean reversion rate is essentially the prior, conveniently centred at the true value, while the others remain quite wide. It may be that the experiment was too ambitious in selecting 30 simultaneous targets with only a total of 150 observations. Without highly informative priors, my beotian reaction is to doubt the feasibility of the inference. In the case of the Finnish bear study, the huge discrepancy between priors and posteriors, as well as the significant difference between the forestry expert estimations and the model predictions should be discussed, if not addressed, possibly via a simulation using the posteriors as priors. Or maybe using a hierarchical Bayes model to gather a time-wise coherence in the number of bear families. (I wonder if this technique would apply to the type of data gathered by Mohan Delampady on the West Ghats tigers…)

Overall, I am slightly intrigued by the practice of running MCMC chains in parallel and merging the outcomes with no further processing. This assumes a lot in terms of convergence and mixing on all the chains. However, convergence is never directly addressed in the paper.

Filed under: Books, Statistics, University life Tagged: -Blackwellized Monte Carlo Data Association, bear, data association, Finland, MCMC, particle Gibbs sampler, pMCMC, prior-posterior discrepancy, Rao-Blackwellisation, simulation, target tracking, tiger, Western Ghats

### Statistics slides (3)

**H**ere is the third set of slides for my third year statistics course. Nothing out of the ordinary, but the opportunity to link statistics and simulation for students not yet exposed to Monte Carlo methods. (No ABC yet, but who knows?, I may use ABC as an entry to Bayesian statistics, following Don Rubin’s example! Surprising typo on the Project Euclid page for this 1984 paper, by the way…) On Monday, I had the pleasant surprise to see Shravan Vasishth in the audience, as he is visiting Université Denis Diderot (Paris 7) this month.

Filed under: Books, Kids, Statistics, University life Tagged: ABC, Bayesian statistics, bootstrap, Don Rubin, empirical cdf, Glivenko-Cantelli Theorem, Monte Carlo methods, Monte Carlo Statistical Methods, Paris, simulation, Université Paris Dauphine

### unicode in LaTeX

**A**s I was hurriedly trying to cram several ‘Og posts into a conference paper (!), I looked around for a way of including Unicode characters straight away. And found this solution on StackExchange:

\usepackage[mathletters]{ucs}

\usepackage[utf8x]{inputenc}

which just suited me fine!

Filed under: Books, Linux, Statistics, University life Tagged: blogging, LaTeX, papers, StackExchange, Unicode, UTF-8, Wordpress

### posterior predictive distributions of Bayes factors

**O**nce a Bayes factor B(y) is computed, one needs to assess its strength. As repeated many times here, Jeffreys’ scale has no validation whatsoever, it is simply a division of the (1,∞) range into regions of convenience. Following earlier proposals in the literature (Box, 1980; García-Donato and Chen, 2005; Geweke and Amisano, 2008), an evaluation of this strength within the issue at stake, i.e. the comparison of two models, can be based on the predictive distribution. While most authors (like García-Donato and Chen) consider the prior predictive, I think using the posterior predictive distribution is more relevant since

- it exploits the information contained in the data y, thus concentrates on a region of relevance in the parameter space(s), which is especially interesting in weakly informative settings (even though we should abstain from testing in those cases, dixit Andrew);
- it reproduces the behaviour of the Bayes factor B(x) for values x of the observation similar to the original observation y;
- it does not hide issues of indeterminacy linked with improper priors: the Bayes factor B(x) remains indeterminate, even with a well-defined predictive;
- it does not separate between errors of type I and errors of type II but instead uses the natural summary provided by the Bayesian analysis, namely the predictive distribution π(x|y);
- as long as the evaluation is not used to reach a decision, there is no issue of “using the data twice”, we are simply producing an estimator of the posterior loss, for instance the (posterior) probability of selecting the wrong model. The Bayes factor B(x) is thus functionally independent of y, while x is probabilistically dependent on y.

Note that, even though probabilities of errors of type I and errors of type II can be computed, they fail to account for the posterior probabilities of both models. (This is the delicate issue with the solution of García-Donato and Chen.) Another nice feature is that the predictive distribution of the Bayes factor can be computed even in complex settings where ABC needs to be used.

Filed under: Books, Kids, Statistics Tagged: Bayes factor, Bayesian predictive, Bayesian tests, posterior predictive

### randomness in coin tosses and last digits of prime numbers

**A** rather intriguing note that was arXived last week: it is essentially one page long and it compares the power law of the frequency range for the Bernoulli experiment with the power law of the frequency range for the distribution of the last digits of the first 10,000 prime numbers to conclude that the power is about the same. With a very long introduction about the nature of randomness that is unrelated with the experiment. And a call to a virtual coin toss website, instead of using R uniform generator… Actually the exact distribution is available, at least asymptotically, for the Bernoulli (coin tossing) case. Among other curiosities, a constant typo in the sign of the coefficient β for the power law. A limitation of the Bernoulli experiment to 10⁴ simulations, rather than the 10⁵ used for the prime numbers. And a conclusion that the distribution of the end digits is truly uniform which relates only to this single experiment!

Filed under: Books, Kids, R, Statistics, University life Tagged: Benford's Law, coin tossing, prime numbers, randomness

### The winds of Winter [Bayesian prediction]

**A** surprising entry on arXiv this morning: Richard Vale (from Christchurch, NZ) has posted a paper about the characters appearing in the yet hypothetical next volume of George R.R. Martin’s Song of ice and fire series, *The winds of Winter* [not even put for pre-sale on amazon!]. Using the previous five books in the series and the frequency of occurrence of characters’ point of view [each chapter being told as from the point of view of one single character], Vale proceeds to model the number of occurrences in a given book by a truncated Poisson model,

in order to account for [most] characters dying at some point in the series. All parameters are endowed with prior distributions, including the terrible “large” hyperpriors familiar to BUGS users… Despite the code being written in R by the author. The modelling does not use anything but the frequencies of the previous books, so knowledge that characters like Eddard Stark had died is not exploited. (Nonetheless, the prediction gives zero chapter to this character in the coming volumes.) Interestingly, a character who seemingly died at the end of the last book is still given a 60% probability of having at least one chapter in *The winds of Winter* [no spoiler here, but many in the paper itself!]. As pointed out by the author, the model as such does not allow for prediction of new-character chapters, which remains likely given Martin’s storytelling style! Vale still predicts 11 new-character chapters, which seems high if considering the series should be over in two more books [and an unpredictable number of years!].

As an aside, this paper makes use of the truncnorm R package, which I did not know and which is based on John Geweke’s accept-reject algorithm for truncated normals that I (independently) proposed a few years later.

Filed under: Books, Kids, R, Statistics, University life Tagged: A Song of Ice and Fire, arXiv, Bayesian predictive, Game of Thrones, George Martin, heroic fantasy, John Geweke, R, The Winds of Winter, truncated normal, truncnorm

### hypothesis testing for MCMC

**A** recent arXival by Benjamin Gyori and Daniel Paulin considers sequential testing based on MCMC simulation. The test is about an expectation under the target and stationary distribution of the Markov chain (i.e., the posterior in a Bayesian setting). Hence testing whether or not the posterior expectation is below a certain bound is not directly relevant from a Bayesian perspective. One would test instead whether or not *the parameter itself* is below the bound… The paper is then more a study of sequential tests when the data is a Markov chain than in any clear connection with MCMC topics. Despite the paper including an example of a Metropolis-Hastings scheme for approximating the posterior on the parameters of an ODE. I am a bit puzzled by the purpose of the test, as I was rather expecting tests connected with the convergence of the Markov chain or of the empirical mean. (But, given the current hour, I may also have missed a crucial point!)

Filed under: Books, Statistics, University life Tagged: Markov chain Monte Carlo algorithm, Markov chains, MCMC, ODEs, sequential testing

### Monte Carlo simulation and resampling methods for social science [book review]

Monte Carlo simulation and resampling methods for social science is a short paperback written by Thomas Carsey and Jeffrey Harden on the use of Monte Carlo simulation to evaluate the adequacy of a model and the impact of assumptions behind this model. I picked it in the library the other day and browse through the chapters during one of my métro rides. Definitely not an in-depth reading, so be warned!

Overall, I think the book is doing a good job of advocating the use of simulation to evaluate the pros and cons of a given model (rephrased as *data generating process*) when faced with data. And doing it in R. After some rudiments in probability theory and in R programming, it briefly explains the use of resident random generators if not of how to handle new distributions and then spend a large part of the book on simulation around generalised and regular linear models. For instance, in the linear model, the authors test the impact of heterocedasticity, multicollinearity, measurement error, omitted variable(s), serial correlation, clustered data, and heavy-tailed errors. While this is a perfect way of exploring those semi-hidden hypotheses behind the linear model, I wonder at the impact on students of this exploration. On the one hand, they will perceive the importance of those assumptions and hopefully remember them. On the other hand, and this is a very recurrent criticism of mine, this implies a lot of maturity from the students, i.e., they have to distinguish the data, the model [maybe] behind the data, the finite if large number of hypotheses one can test, and the interpretation of the outcome of a simulation test… Given that they were introduced to basic probability just a few chapters before, this expectation [from the students] may prove unrealistic. (And a similar criticism applies to the following chapters, from GLM to jackknife and bootstrap.)

At the end of the book, the authors ask the question as to how could a reader use the information in this book towards one’s work. Drafting a generic protocol for this reader, who is supposed to consider “alterations to the data generating process” (p.272) and to “identify a possible problem or assumption violation” (p.271). Thus requiring a readership “who has some training in quantitative methods” (p.1). And then some more. But I definitely sympathise with the goal of confronting models and theory with the harsh reality of simulation output!

Filed under: Books, Kids, R, Statistics, University life Tagged: bootstrap, data generating process, jacknife, Monte Carlo methods, Monte Carlo simulation and resampling methods for social science, resampling, simulation

### misty dawn

### Argentan half-marathon [1 25' 02" - 29/503 - V2: 3/104 - 19°C]

**A **rather comparable race in Argentan with last year, with exactly the same time, a similar weather (a bit too hot and too windy), and a lack of pack for half of the race that again saw me running by myself the whole second half. This was my iXXth Argentan half-marathon. I started a bit more slowly than last year, wary of the headwind, then accelerated in the forest and felt really well till the 17th kilometer, where I slowed down a bit, with a group of three runners passing me on the 18th and alas one of them V2. I tried to keep up till the 20th, but they ended up 20 seconds faster. This year I had left my traditional Insee Paris Club red tank top, to reduce the chances to be spotted but my opponent from last year was apparently not there. The number of runners seems to be going down steadily from one year to the next, maybe more so this year due to another race on Pont de Normandie tomorrow. This helped with keeping my chances of reaching a podium higher than expected, as I was not at all thinking I could make it again this year… [Thanks again and again to the photographs of Normandiecourseapied for their free pictures!]

Filed under: pictures, Running Tagged: Argentan, half-marathon, Normandy, race, veteran (V2)

### Rogue Male [book review]

**W**hen I was about to leave a library in Birmingham, I spotted a “buy one get one half-price” book on a pile next to the cashier. Despite a rather weird title, Geoffrey Household’s *Rogue Male* looked classic enough to rank with Graham Green’s *Confidential Agent* or Erskine Childers’ *Riddle of the Sands* or yet John Buchan’s *39 Steps*… Not mentioning the early Eric Ambler novels. I mean, a classic British thriller with political ramifications and a central character exposed with shortcomings and doubts. After reading the book last week, I am glad I impulsively bought it. *Rogue Male* is not a Greene’s novel and this for several reason: (a) it is much more nationalistic, to the point of refusing to contact English authorities for fear of exposing some official backup of the attempted assassination, while Greene seemed to lean more to the Left, (b) it is both less and more psychological, in that it (i) superbly describes the process of getting* rogue*, i.e. of being hunted and of cutting or trying to cut [some] human feelings to rely on animal instincts for survival but (ii) leaves the overall motivation for Hitler’s attempted assassination and for the hunt by Nazi secret agents mostly unspecified (c) it involves a very limited number of characters, all of them men, (d) it leaves so much of the action at the periphery that this appears as a weakness of the book… Still, there are some features also found in Greene’s *Confidential Agent* like the character failing in his attempt and being nearly captured or killed in the ensuing hunt, or the inner doubts about the (un)ethical nature of the fight… (Actually, both Greene and Household worked for the British secret services.) The overall story behind *Rogue Male* is a wee bit shallow and often too allusive to make sense but the underground part with the final psychological battle is superb. Truly a classic!

Filed under: Books Tagged: book review, Geoffrey Household, Graham Greene, Rogue Male, secret services, thriller

### echo vulnerable

**E**ven though most people are now aware of the Shellshock security problem on the bash shell, here is a test to check whether your Unix system is at risk:

if the prompt returns vulnerable, it means the system is vulnerable and needs to be upgraded with the proper security patch… For instance running

sudo apt-get update && sudo apt-get install --only-upgrade bashfor Debian/Ubuntu versions. Check Apple support page for Apple OS.

Filed under: Linux Tagged: Linux, Shellshock, ubuntu, unix