## Xian's Og

### another viral math puzzle

After the Singapore Maths Olympiad birthday problem that went viral, here is a Vietnamese primary school puzzle that made the frontline in The Guardian. The question is: *Fill the empty slots with all integers from 1 to 9 for the equality to hold*. In other words, find *a,b,c,d,e,f,g,h,i* such that

*a*+13x*b*:*c*+*d*+12x*e*–*f*-11+*g*x*h*:*i*-10=66.

With presumably the operation ordering corresponding to

*a*+(13x*b*:*c)*+*d*+(12x*e)*–*f*-11+(*g*x*h*:*i)*-10=66

although this is not specified in the question. Which amounts to

*a*+(13x*b*:*c)*+*d*+(12x*e)*–*f*+(*g*x*h*:*i)*=87

and implies that *c* divides *b* and *i* divides *g*x*h*. Rather than pursing this analytical quest further, I resorted to R coding, checking by brute force whether or not a given sequence was working.

I then applied this function to all permutations of {1,…,9}* [with the help of the perm(combinat) R function]* and found the 128 distinct solutions. Including some for which b:c is not an integer. (Not of this obviously gives a hint as to how a 8-year old could solve the puzzle.)

Filed under: Books, Kids, R, University life Tagged: mathematical puzzle, permutation, primary school, The Guardian, Vietnam

### an even more senseless taxi-ride

**I** was (exceptionally) working in (and for) my garden when my daughter shouted down from her window that John Nash had just died. I thus completed my tree trimming and went to check about this sad item of news. What I read made the news even sadder as he and his wife had died in a taxi crash in New Jersey, apparently for not wearing seat-belts, a strategy you would think far from minimax… Since Nash was in Norway a few days earlier to receive the 2015 Abel Prize, it may even be that the couple was on its way home back from the airport. A senseless death for a Beautiful Mind.

Filed under: Books, Kids, pictures, Travel, University life Tagged: A Beautiful Mind, Abel Prize, car accident, John Nash, New Jersey, Nobel Prize, Norway, Princeton University, seat-belt, taxi

### 39% anglo-irish!

**A**s I have always been curious about my ancestry, I made a DNA test on 23andMe. While the company no longer provides statistics about potential medical conditions because of a lawsuit, it does return an ancestry analysis of sorts. In my case, my major ancestry composition is Anglo-Irish! (with 39% of my DNA) and northern European (with 32%), while only 19% is Franco-German… In retrospect, not so much of a surprise—not because of my well-known Anglophilia but—given that my (known, i.e., at least for the direct ancestral branches) family roots are in Normandy—whose duke invaded Britain in 1056—and Brittany—which was invaded by British Celts fleeing Anglo-Saxons in the 400’s. What’s maybe more surprising to me is that the database contained 23 people identified as 4th degree cousins and a total of 652 relatives… While the potential number of my potential 4th degree cousins stands in the 10,000’s, and hence there may indeed be a few ending up as 23andMe—mostly American—customers, I am indeed surprised that a .37% coincidence in our genes qualifies for being 4th degree cousins! But given that I only share 3.1% with my great⁴-grandfather, it actually make sense that I share about .1% to .4% with such remote cousins. However I wonder at the precision of such an allocation: could those cousins be even more remotely related? Not related at all? *[Warning: All the links to 23andMe in this post are part of their referral program.]*

Filed under: Kids, Statistics, Travel Tagged: 23andMe, Britons, Brittany, Celts, common ancestor, DNA, genealogy, Normandy, sequencing

### a senseless taxi-ride

**T**his morning, on my way to the airport (and to Montpellier for a seminar), Rock, my favourite taxi-driver, told me of a strange ride he endured the night before, so strange that he had not yet fully got over it! As it happened, he had picked an elderly lady with two large bags in the vicinity after a radio-call and drove her to a sort of catholic hostel in down-town Paris, near La Santé jail, a pastoral place housing visiting nuns and priests. However, when they arrived there, she asked the taxi to wait before leaving, quite appropriately as she had apparently failed to book the place. She then asked my friend to take her to another specific address, an hotel located nearby at Denfert-Rochereau. While Rock was waiting and the taxi counter running, the passenger literally *checked in* by visiting the hotel room and deciding she did not like it so she gave my taxi yet another hotel address near Saint-Honoré where she repeated the same process, namely visited the hotel room with the same outcome that she did not like the place. My friend was then getting worried about the meaning of this processionary trip all over Paris, the more because the lady did not have a particularly coherent discourse. And could not stop talking. The passenger then made him stop for food and drink, and, while getting back in the taxi, ordered him to drive her back to her starting place. After two hours and half, they thus came back to the place, with a total bill of 113 euros. The lady then handled a 100 euro bill to the taxi-driver, declaring she did not have any further money and that he should have brought her home directly from the first place they had stopped… In my friend’s experience, this was the weirdest passenger he ever carried and he thought the true point of the ride was to escape solitude and loneliness for one evening, even if chatting about non-sense the whole time.

Filed under: pictures, Travel Tagged: Montpellier, Paris, story, taxi, taxi-driver

### postdoc in the Alps

**Post-doctoral Position in Spatial/Computational Statistics (Grenoble, France)**

**A post-doctoral position is available in Grenoble, France, to work on computational methods for spatial point process models. The candidate will work with Simon Barthelmé (GIPSA-lab, CNRS) and Jean-François Coeurjolly (Univ. Grenoble Alpes, Laboratory Jean Kuntzmann) on extending point process methodology to deal with large datasets involving multiple sources of variation. We will focus on eye movement data, a new and exciting application area for spatial statistics. The work will take place in the context of an interdisciplinary project on eye movement modelling involving psychologists, statisticians and applied mathematicians from three different institutes in Grenoble.**

The ideal candidate has a background in spatial or computational statistics or machine learning. Knowledge of R (and in particular the package spatstat) and previous experience with point process models is a definite plus.

The duration of the contract is 12+6 months, starting 01.10.2015 at the earliest. Salary is according to standard CNRS scale (roughly EUR 2k/month).

Grenoble is the largest city in the French Alps, with a very strong science and technology cluster. It is a pleasant place to live, in an exceptional mountain environment.

Filed under: Kids, Mountains, Statistics, Travel, University life Tagged: Alps, CNRS, computational statistics, Grenoble, IMAG, Mount Lady Macdonald, mountains, point processes, postdoctoral position, spatial statistics

### Bruce Lindsay (March 7, 1947 — May 5, 2015)

**W**hen early registering for Seattle (JSM 2015) today, I discovered on the ASA webpage the very sad news that Bruce Lindsay had passed away on May 5. While Bruce was not a very close friend, we had met and interacted enough times for me to feel quite strongly about his most untimely death. Bruce was indeed “Mister mixtures” in many ways and I have always admired the unusual and innovative ways he had found for analysing mixtures. Including algebraic ones through the rank of associated matrices. Which is why I first met him—besides a few words at the 1989 Gertrude Cox (first) scholarship race in Washington DC—at the workshop I organised with Gilles Celeux and Mike West in Aussois, French Alps, in 1995. After this meeting, we met twice in Edinburgh at ICMS workshops on mixtures, organised with Mike Titterington. I remember sitting next to Bruce at one workshop dinner (at Blonde) and him talking about his childhood in Oregon and his father being a journalist and how this induced him to become an academic. He also contributed a chapter on estimating the number of components [of a mixture] to the Wiley book we edited out of this workshop. Obviously, his work extended beyond mixtures to a general neo-Fisherian theory of likelihood inference. (Bruce was certainly *not* a Bayesian!) Last time, I met him, it was in Italia, at a likelihood workshop in Venezia, October 2012, mixing Bayesian nonparametrics, intractable likelihoods, and pseudo-likelihoods. He gave a survey talk about composite likelihood, telling me about his extended stay in Italy (Padua?) around that time… So, Bruce, I hope you are now running great marathons in a place so full of mixtures that you can always keep ahead of the pack! Fare well!

Filed under: Books, Running, Statistics, Travel, University life Tagged: American statisticians, Bruce Lindsay, composite likelihood, Edinburgh, ICMS, Italia, marathon, mixture estimation, mixtures of distributions, Penn State University, unknown number of components, Venezia

### da San Geminiano

Filed under: pictures, Travel, Wines Tagged: Italia, medieval architecture, San Geminiano, Spring, Tuscany

### non-reversible MCMC

**W**hile visiting Dauphine, Natesh Pillai and Aaron Smith pointed out this interesting paper of Joris Bierkens (Warwick) that had escaped my arXiv watch/monitoring. The paper is about turning Metropolis-Hastings algorithms into non-reversible versions, towards improving mixing.

In a discrete setting, a way to produce a non-reversible move is to mix the proposal kernel Q with its time-reversed version Q’ and use an acceptance probability of the form

where ε is any weight. This construction is generalised in the paper to any vorticity (skew-symmetric with zero sum rows) matrix Γ, with the acceptance probability

where ε is small enough to ensure all numerator values are non-negative. This is a rather annoying assumption in that, except for the special case derived from the time-reversed kernel, it has to be checked over all pairs (x,y). (I first thought it also implied the normalising constant of π but everything can be set in terms of the unormalised version of π, Γ or ε included.) The paper establishes that the new acceptance probability preserves π as its stationary distribution. An alternative construction is to make the proposal change from Q in H such that H(x,y)=Q(x,y)+εΓ(x,y)/π(x). Which seems more pertinent as not changing the proposal cannot improve that much the mixing behaviour of the chain. Still, the move to the non-reversible versions has the noticeable plus of decreasing the asymptotic variance of the Monte Carlo estimate for any integrable function. Any. (Those results are found in the physics literature of the 2000’s.)

The extension to the continuous case is a wee bit more delicate. One needs to find an anti-symmetric vortex function g with zero integral [equivalent to the row sums being zero] such that g(x,y)+π(y)q(y,x)>0 and with same support as π(x)q(x,y) so that the acceptance probability of g(x,y)+π(y)q(y,x)/π(x)q(x,y) leads to π being the stationary distribution. Once again g(x,y)=ε(π(y)q(y,x)-π(x)q(x,y)) is a natural candidate but it is unclear to me why it should work. As the paper only contains one illustration for the discretised Ornstein-Uhlenbeck model, with the above choice of g for a small enough ε (a point I fail to understand since any ε<1 should provide a positive g(x,y)+π(y)q(y,x)), it is also unclear to me that this modification (i) is widely applicable and (ii) is relevant for genuine MCMC settings.

Filed under: Books, Statistics, University life Tagged: arXiv, MCMC algorithms, Monte Carlo Statistical Methods, Ornstein-Uhlenbeck model, reversibility, Université Paris Dauphine, University of Warwick

### Kaefferkopf

Filed under: pictures, Wines Tagged: Alsace, French wines, Gewurztraminer, Kaefferkopf, Riesling, Strasbourg

### speed seminar-ing

**Y**esterday, I made a quick afternoon trip to Montpellier as replacement of a seminar speaker who had cancelled at the last minute. Most obviously, I gave a talk about our “testing as mixture” proposal. And as previously, the talk generated a fair amount of discussion and feedback from the audience. Providing me with additional aspects to include in a revision of the paper. Whether or not the current submission is rejected, new points made and received during those seminars will have to get in a revised version as they definitely add to the appeal to the perspective. In that seminar, most of the discussion concentrated on the connection with *decisions* based on such a tool as the posterior distribution of the mixture weight(s). My argument for sticking with the posterior rather than providing a hard decision rule was that the message is indeed in arguing hard rules that end up mimicking the p- or b-values. And the catastrophic consequences of fishing for significance and the like. Producing instead a validation by simulating under each model pseudo-samples shows what to expect for each model under comparison. The argument did not really convince Jean-Michel Marin, I am afraid! Another point he raised was that we could instead use a distribution on α with support {0,1}, to avoid the encompassing model he felt was too far from the original models. However, this leads back to the Bayes factor as the weights in 0 and 1 are the marginal likelihoods, nothing more. However, this perspective on the classical approach has at least the appeal of completely validating the use of improper priors on common (nuisance or not) parameters. Pierre Pudlo also wondered why we could not conduct an analysis on the mixture of the likelihoods. Instead of the likelihood of the mixture. My first answer was that there was not enough information in the data for estimating the weight(s). A few more seconds of reflection led me to the further argument that the posterior on α with support (0,1) would then be a mixture of Be(2,1) and Be(1,2) with weights the marginal likelihoods, again (under a uniform prior on α). So indeed not much to gain. A last point we discussed was the case of the evolution trees we analyse with population geneticists from the neighbourhood (and with ABC). Jean-Michel’s argument was that the scenari under comparison were not compatible with a mixture, the models being exclusive. My reply involved an admixture model that contained all scenarios as special cases. After a longer pondering, I think his objection was more about the non iid nature of the data. But the admixture construction remains valid. And makes a very strong case in favour of our approach, I believe.

After the seminar, Christian Lavergne and Jean-Michel had organised a doubly exceptional wine-and-cheese party: first because it is not usually the case there is such a post-seminar party and second because they had chosen a terrific series of wines from the Mas Bruguière (Pic Saint-Loup) vineyards. Ending up with a great 2007 L’Arbouse. Perfect ending for an exciting day. (I am not even mentioning a special Livarot from close to my home-town!)

Filed under: Books, pictures, Statistics, Travel, University life, Wines Tagged: ABC, Bayes factor, Bayesian model choice, Bayesian testing, French cheese, French wines, Languedoc wines, Livarot, Mas Bruguière, Montpellier, Pic Saint Loup

### Toscana [#2]

Filed under: pictures, Travel, Wines Tagged: Castello di Montefioralle, Greve en Chianti, Italia, Italian wines, San Giovese, sunset, Tuscany

### Cauchy Distribution: Evil or Angel?

**N**atesh Pillai and Xiao-Li Meng just arXived a short paper that solves the Cauchy conjecture of Drton and Xiao [I mentioned last year at JSM], namely that, when considering two normal vectors with generic variance matrix S, a weighted average of the ratios X/Y remains Cauchy(0,1), just as in the iid S=I case. Even when the weights are random. The fascinating side of this now resolved (!) conjecture is that the correlation between the terms does not seem to matter. Pushing the correlation to one [assuming it is meaningful, which is a suspension of belief!, since there is no standard correlation for Cauchy variates] leads to a paradox: all terms are equal and yet… it works: we recover a single term, which again is Cauchy(0,1). All that remains thus to prove is that it stays Cauchy(0,1) between those two extremes, a weird kind of intermediary values theorem!

Actually, Natesh and XL further prove an inverse χ² theorem: the inverse of the normal vector, renormalised into a quadratic form is an inverse χ² no matter what its covariance matrix. The proof of this amazing theorem relies on a spherical representation of the bivariate Gaussian (also underlying the Box-Müller algorithm). The angles are then jointly distributed as

and from there follows the argument that conditional on the differences between the θ’s, all ratios are Cauchy distributed. Hence the conclusion!

A question that stems from reading this version of the paper is whether this property extends to other formats of non-independent Cauchy variates. Somewhat connected to my recent post about generating correlated variates from arbitrary distributions: using the inverse cdf transform of a Gaussian copula shows this is possibly the case: the following code is meaningless in that the empirical correlation has no connection with a “true” correlation, but nonetheless the experiment seems of interest…

> ro=.999999;x=matrix(rnorm(2e4),ncol=2);y=ro*x+sqrt(1-ro^2)*matrix(rnorm(2e4),ncol=2) > cor(x[,1]/x[,2],y[,1]/y[,2]) [1] -0.1351967 > ro=.99999999;x=matrix(rnorm(2e4),ncol=2);y=ro*x+sqrt(1-ro^2)*matrix(rnorm(2e4),ncol=2) > cor(x[,1]/x[,2],y[,1]/y[,2]) [1] 0.8622714 > ro=1-1e-5;x=matrix(rnorm(2e4),ncol=2);y=ro*x+sqrt(1-ro^2)*matrix(rnorm(2e4),ncol=2) > z=qcauchy(pnorm(as.vector(x)));w=qcauchy(pnorm(as.vector(y))) > cor(x=z,y=w) [1] 0.9999732 > ks.test((z+w)/2,"pcauchy") One-sample Kolmogorov-Smirnov test data: (z + w)/2 D = 0.0068, p-value = 0.3203 alternative hypothesis: two-sided > ro=1-1e-3;x=matrix(rnorm(2e4),ncol=2);y=ro*x+sqrt(1-ro^2)*matrix(rnorm(2e4),ncol=2) > z=qcauchy(pnorm(as.vector(x)));w=qcauchy(pnorm(as.vector(y))) > cor(x=z,y=w) [1] 0.9920858 > ks.test((z+w)/2,"pcauchy") One-sample Kolmogorov-Smirnov test data: (z + w)/2 D = 0.0036, p-value = 0.9574 alternative hypothesis: two-sidedFiled under: Books, pictures, Running, Statistics, Travel, University life, Wines Tagged: Boston, Box-Muller algorithm, Cauchy distribution, champagne, correlation, Harvard University, JSM 2014, Mathias Drton, Monte Carlo Statistical Methods, Mystic river, Natesh Pillai, Sommerville, Xiao-Li Meng

### Da Siena

Filed under: Kids, pictures, Travel, Wines Tagged: Chianti, Italia, Italian wines, medieval architecture, San Geminiano, Tusca

### Da Firenze

### Toscana

### the Flatland paradox [reply from the author]

*[Here is a reply by Pierre Druihlet to my comments on his paper.]*

**T**here are several goals in the paper, the last one being the most important one.

The first one is to insist that considering θ as a parameter is not appropriate. We are in complete agreement on that point, but I prefer considering l(θ) as the parameter rather than N, mainly because it is much simpler. Knowing N, the law of l(θ) is given by the law of a random walk with 0 as reflexive boundary (Jaynes in his book, explores this link). So for a given prior on N, we can derive a prior on l(θ). Since the random process that generate N is completely unknown, except that N is probably large, the true law of l(θ) is completely unknown, so we may consider l(θ).

The second one is to state explicitly that a flat prior on θ implies an exponentially increasing prior on l(θ). As an anecdote, Stone, in 1972, warned against this kind of prior for Gaussian models. Another interesting anecdote is that he cited the novel by Abbot “Flatland : a romance of many dimension” who described a world where the dimension is changed. This is exactly the case in the FP since θ has to be seen in two dimensions rather than in one dimension.

The third one is to make a distinction between randomness of the parameter and prior distribution, each one having its own rule. This point is extensively discussed in Section 2.3.

– In the intuitive reasoning, the probability of no annihilation involves the true joint distribution on (θ, x) and therefore the true unknown distribution of θ,.

– In the Bayesian reasoning, the posterior probability of no annihilation is derived from the prior distribution which is improper. The underlying idea is that a prior distribution does not obey probability rules but belongs to a projective space of measure. This is especially true if the prior does not represent an accurate knowledge. In that case, there is no discontinuity between proper and improper priors and therefore the impropriety of the distribution is not a key point. In that context, the joint and marginal distributions are irrelevant, not because the prior is improper, but because it is a prior and not a true law. If the prior were the true probability law of θ,, then the flat distribution could not be considered as a limit of probability distributions.

For most applications, the distinction between prior and probability law is not necessary and even pedantic, but it may appear essential in some situations. For example, in the Jeffreys-Lindley paradox, we may note that the construction of the prior is not compatible with the projective space structure.

Filed under: Books, Statistics, University life Tagged: Abbot, flat prior, Flatland, Gaussian random walk, improper prior, marginalisation paradoxes, Mervyn Stone

### Marc Yor

Filed under: Books, Statistics, University life Tagged: Brownian motion, French mathematicians, Gazette des Mathématiciens, Marc Yor, MATAPLI, SFDS, SMF

### the Flatland paradox

**P**ierre Druilhet arXived a note a few days ago about the Flatland paradox (due to Stone, 1976) and his arguments against the flat prior. The paradox in this highly artificial setting is as follows: Consider a sequence θ of N independent draws from {a,b,1/a,1/b} such that

- N and θ are unknown;
- a draw followed by its inverse and this inverse are removed from θ;
- the successor
*x*of θ is observed, meaning an extra draw is made and the above rule applied.

Then the frequentist probability that *x* is longer than θ given θ is at least 3/4—*at least* because θ could be zero—while the posterior probability that *x* is longer than θ given x is 1/4 under the flat prior over θ. Paradox that 3/4 and 1/4 clash. Not so much of a paradox because there is no joint probability distribution over (x,θ).

The paradox was actually discussed at length in Larry Wasserman’s now defunct Normal Variate. From which I borrowed Larry’s graphical representation of the four possible values of θ given the (green) endpoint of *x*. Larry uses the Flatland paradox hammer to fix another nail on the coffin he contemplates for improper priors. And all things Bayes. Pierre (like others before him) argues against the flat prior on θ and shows that a flat prior on the length of θ leads to recover 3/4 as the posterior probability that *x* is longer than θ.

As I was reading the paper in the métro yesterday morning, I became less and less satisfied with the whole analysis of the problem in that I could not perceive θ as a *parameter* of the model. While this may sound a pedantic distinction, θ is a *latent variable* (or a *random effect*) associated with *x* in a model where the only unknown parameter is N, the total number of draws used to produce θ and *x*. The distributions of both θ and *x* are entirely determined by N. (In that sense, the flatland paradox can be seen as a marginalisation paradox in that an improper prior on N cannot be interpreted as projecting a prior on θ.) Given N, the distribution of *x* of length *l(x)* is then 1/4N times the number of ways of picking (N-*l(x)*) annihilation steps among N. Using a prior on N like 1/N , which is improper, then leads to favour the shortest path as well. (After discussing the issue with Pierre Druilhet, I realised he had a similar perspective on the issue. Except that he puts a flat prior on the length *l(x)*.) Looking a wee bit further for references, I also found that Bruce Hill had adopted the same perspective of a prior on N.

Filed under: Books, Kids, R, Statistics, University life Tagged: combinatorics, Flatland, improper priors, Larry Wasserman, marginalisation paradoxes, paradox, Pierre Druilhet, subjective versus objective Bayes, William Feller

### terrible graph of the day

**A** truly terrible graph in Le Monde about overweight and obesity in the EU countries (and Switzerland). The circle presentation makes no logical sense. Countries are ordered by 2030 overweight percentages, which implies the order differs for men and women. (With a neat sexist differentiation between male and female figures.) The allocation of the (2010) grey bar to its country is unclear (left or right?). And there is no uncertain associated with the 2030 predictions. There is no message coming out of the graph, like the massive explosion in the obesity and overweight percentages in EU countries. Now, given that the data is available for women and men, ‘Og’s readers should feel free to send me alternative representations!

Filed under: Books, Kids, R, Statistics Tagged: bad graph, EU, Le Monde, obesity, OMS, overweight, prediction

### quantile functions: mileage may vary

**W**hen experimenting with various quantiles functions in R, I was shocked *[ok this is a bit excessive, let us say surprised]* by how widely the execution times would vary. To the point of blaming a completely different feature of R. Borrowing from Charlie Geyer’s webpage on the topic of probability distributions in R, here is a table for some standard distributions: I ran

choosing an arbitrary parameter whenever needed.

Distribution Function Time Cauchy qcauchy 2.2 Chi-Square qchisq 43.8 Exponential qexp 0.95 F qf 34.2 Gamma qgamma 37.2 Logistic qlogis 1.7 Log Normal qlnorm 2.2 Normal qnorm 1.4 Student t qt 31.7 Uniform qunif 0.86 Weibull qweibull 2.9Of course, it does not mean much in that all the slow distributions (except for Weibull) are parameterised. Nonetheless, that a chi-square inversion take 50 times longer than a uniform inversion remains puzzling as to why it is not coded more efficiently. In particular, I was wondering why the chi-square inversion was slower than the Gamma inversion. Rerunning both inversions showed that they are equivalent:

> u=runif(1e7) > system.time(x<-qgamma(u,sha=1.5)) utilisateur système écoulé 21.534 0.016 21.532 > system.time(x<-qchisq(u,df=3)) utilisateur système écoulé 21.372 0.008 21.361Which also shows how variable *system.time* can be.

Filed under: Books, R, Statistics Tagged: Charlie Geyer, execution time, pseudo-random generator, R, random simulation, standard quantile functions, system.time