## Xian's Og

### borderline infinite variance in importance sampling

**A**s I was still musing about the posts of last week around infinite variance importance sampling and its potential corrections, I wondered at whether or not there was a fundamental difference between “just” having a [finite] variance and “just” having none. In conjunction with Aki’s post. To get a better feeling, I ran a quick experiment with Exp(1) as the target and Exp(a) as the importance distribution. When estimating **E[X]**=1, the above graph opposes a=1.95 to a=2.05 (variance versus no variance, bright yellow versus wheat), a=2.95 to a=3.05 (third moment versus none, bright yellow versus wheat), and a=3.95 to a=4.05 (fourth moment versus none, bright yellow versus wheat). The graph below is the same for the estimation of **E[**exp(**X**/2)**]**=2, which has an integrand that is not square integrable under the target. Hence seems to require higher moments for the importance weight. Hard to derive universal theories from those two graphs, however they show that protection against sudden drifts in the estimation sequence. As an aside [not really!], apart from our rather confidential Confidence bands for Brownian motion and applications to Monte Carlo simulation with Wilfrid Kendall and Jean-Michel Marin, I do not know of many studies that consider the sequence of averages time-wise rather than across realisations at a given time and still think this is a more relevant perspective for simulation purposes.

Filed under: Books, Kids, Statistics Tagged: continuity, importance sampling, infinite variance estimators, moments, Monte Carlo experiment, Monte Carlo Statistical Methods

### Sunday morning puzzle

**A** question from X validated that took me quite a while to fathom and then the solution suddenly became quite obvious:

*If a sample taken from an arbitrary distribution on {0,1}⁶ is censored from its (0,0,0,0,0,0) elements, and if the marginal probabilities are know for all six components of the random vector, what is an estimate of the proportion of (missing) (0,0,0,0,0,0) elements? *

Since the censoring modifies all probabilities by the same renormalisation, i.e. divides them by the probability to be different from (0,0,0,0,0,0), *ρ*, this probability can be estimated by looking at the marginal probabilities to be equal to 1, which equal the original and known marginal probabilities divided by *ρ*. Here is a short R code illustrating the approach that I wrote in the taxi home yesterday night:

A broader question is how many values (and which values) of the sample can be removed before this recovery gets impossible (with the same amount of information).

Filed under: Books, Kids, R Tagged: conditional probability, cross validated, mathematical puzzle, R

### fluctuat nec cogitat

**A**s was alas predictable, the mass assassinations in Paris last Friday led to senseless gesticulations and warmongering declarations from French political leaders. Hence my title, borrowed from Paris motto: intended to mean “Fluctuating without thinking”, or, as Google translate would inadvertently put it in a strangely appropriate way, to “turn and toss”. Martial emergency order has been declared and is contemplated for months (months!) ahead. After the “Charlie Hebdo” law on intelligence voted a few months ago (which actually goes against everything Charlie stood for!), more attacks on civil liberties are now to come, with a revision of the Constitution with the next days (days!)… Hence this drawing from Steve Bell in The Guardian: While French military forces are engaged in Irak and Syria in a dubious campaign, I completely dispute the notion that France is “at war” and see present declarations and new legislations as catering to the rightmost fringes of the public opinion. With no other clear consequences than to pave the way for the French extreme right to win the coming election(s). Just as withdrawing French citizenship from double nationals who anyway do not consider themselves as French and who are determined to blow themselves on demand is ridiculously inappropriate. The similitude between the reactions of Bush after 09/11/01 and Hollande after 11/13/15 is striking and frightening in the lack of long-term vision and of mere rational thinking… To end up with another translation, “to rock with no purpose”, indeed.

Filed under: Kids, pictures Tagged: fundamentals

### Browne

Filed under: pictures, Travel, Wines Tagged: Cabernet, Columbia Valley, Petit Verdot, Seattle, Walla Walla, Washington State

### multiple importance sampling

*“Within this unified context, it is possible to interpret that all the MIS algorithms draw samples from a equal-weighted mixture distribution obtained from the set of available proposal pdfs.”*

**I**n a very special (important?!) week for importance sampling!, Elvira et al. arXived a paper about generalized multiple importance sampling. The setting is the same as in earlier papers by Veach and Gibas (1995) or Owen and Zhou (2000) [and in our AMIS paper], namely a collection of importance functions and of simulations from those functions. However, there is no adaptivity for the construction of the importance functions and no Markov (MCMC) dependence on the generation of the simulations.

*“One of the goals of this paper is to provide the practitioner with solid theoretical results about the superiority of some specific MIS schemes.”*

One first part deals with the fact that a random point taken from the conjunction of those samples is distributed from the equiweighted mixture. Which was a fact I had much appreciated when reading Owen and Zhou (2000). From there, the authors discuss the various choices of importance weighting. Meaning the different degrees of Rao-Blackwellisation that can be applied to the sample. As we discovered in our population Monte Carlo research [which is well-referred within this paper], conditioning too much leads to useless adaptivity. Again a sort of epiphany for me, in that a whole family of importance functions could be used for the same target expectation and the very same simulated value: it all depends on the degree of conditioning employed for the construction of the importance function. To get around the annoying fact that self-normalised estimators are never unbiased, the authors borrow Liu’s (2000) notion of proper importance sampling estimators, where the ratio of the expectations is returning the right quantity. (Which amounts to recover the correct normalising constant(s), I believe.) They then introduce five (5!) different possible importance weights that all produce proper estimators. However, those weights correspond to different sampling schemes, so do not apply to the same sample. In other words, they are not recycling weights as in AMIS. And do not cover the adaptive cases where the weights and parameters of the different proposals change along iterations. Unsurprisingly, the smallest variance estimator is the one based on sampling without replacement and an importance weight made of the entire mixture. But this result does not apply for the self-normalised version, whose variance remains intractable.

I find this survey of existing and non-existing multiple importance methods quite relevant and a must-read for my students (and beyond!). My reservations (for reservations there must be!) are that the study stops short of pushing further the optimisation. Indeed, the available importance functions are not equivalent in terms of the target and hence weighting them equally is sub-efficient. The adaptive part of the paper broaches upon this issue but does not conclude.

Filed under: Books, Statistics, University life Tagged: adaptive mixture importance sampling, AMIS, importance sampling, MCMC, Monte Carlo Statistical Methods, multiple importance methods, multiple mixtures, population Monte Carlo, simulation

### OxWaSP seminar

**T**his Friday, I give a seminar talk on Rao-Blackwellisation for MCMC methods to the new students of the Oxford-Warwick Statistical Programme (OxWaSP). Here are my slides, working on three papers of mine on the topic, with George Casella (1996), Randal Douc (2011), and Marco Banterle, Clara Grazian and Anthony Lee (2015).

Filed under: Kids, Statistics, University life

### intractable likelihoods (even) for Alan

**I**n connection with the official launch of the Alan Turing Institute (or ATI, of which Warwick is a partner), it funded an ATI Scoping workshop yesterday a week ago in Warwick around the notion(s) of intractable likelihood(s) and how this could/should fit within the themes of the Institute [hence the scoping]. This is one among many such scoping workshops taking place at all partners, as reported on the ATI website. Workshop that was quite relaxed and great fun, if only for getting together with most people (and friends) in the UK interested in the topic. But also pointing out some new themes I had not previously though of as related to ilike. For instance, questioning the relevance of likelihood for inference and putting forward decision theory under model misspecification, connecting with privacy and ethics [hence making intractable “good”!], introducing uncertain likelihood, getting more into network models, RKHS as a natural summary statistic, swarm of solutions for consensus inference… (And thanks to Mark Girolami for this homage to the iconic LP of the Sex Pistols!, that I played maniacally all over 1978…) My own two-cents into the discussion were mostly variations of other discussions, borrowing from ABC (and ABC slides) to call for a novel approach to approximate inference:

Filed under: Kids, pictures, Statistics Tagged: ABC, Alan Turing Institute, consensus, decision theory, intractable likelihood, likelihood function, misspecified model, network, privacy, RKHS, Sex Pistols, summary statistics, University of Warwick

### MCMskv, Lenzerheide, 4-7 Jan., 2016 [news #4]

**W**hile the deadline for Breaking News! submission is now over, with close to 20 submissions!, there is a new opening for cheaper lodging: The CUBE hotel in Savognin (20km away) has an offer at 110 CHF per person and per night, including breakfast, dinner, and skipass in a room for 3 people (or more). Be sure to mention MCMski in the subject of your email. As mentioned in the previous post, there are other opportunities in nearby villages, for instance Tiefencastel, 11km away with a 19mn bus connection, or Chur, 18km away with a slower 39mn bus connection, but with a very wide range of offers.

Filed under: Kids, Mountains, pictures, Travel, University life Tagged: accommodation, airbnb, Bayesian computation, Chur, ISBA, Lenzerheide, lodging, MCMSki, MCMskv, Monte Carlo Statistical Methods, Sankt Moritz, ski town, Switzerland, Zurich

### data augmentation with divergence

**A**nother (!) Cross Validated question that shed some light on the difficulties of explaining the convergence of MCMC algorithms. Or in understanding conditioning and hierarchical models. The author wanted to know why a data augmentation of his did not converge: In a simplified setting, given an observation y that he wrote as y=h(x,θ), he had built a Gibbs sampler by reconstructing x=g(y,θ) and simulating θ given x: at each iteration t,

- compute xt=g(y,θt-1)
- simulate θt~π(θ|xt)

and he attributed the lack of convergence to a possible difficulty with the Jacobian. My own interpretation of the issue was rather that condition on the unobserved x was not the same as conditioning on the observed y and hence that y was missing from step 2. And that the simulation of x is useless. Unless one uses it in an augmented scheme à la Xiao-Li… Nonetheless, I like the problem, if only because my very first reaction was to draw a hierarchical dependence graph and to conclude this should be correct, before checking on a toy example that it was not!

Filed under: Books, Kids, Statistics, University life Tagged: auxiliary variable, conditioning, cross validated, Gibbs sampling, MCMC, Monte Carlo Statistical Methods

### fellowship openings at the Alan Turing Institute

*[Verbatim from the ** Alan Turing Institute webpage]*Alan Turing Fellowships

This is a unique opportunity for early career researchers to join The Alan Turing Institute. The Alan Turing Institute is the UK’s new national data science institute, established to bring together world-leading expertise to provide leadership in the emerging field of data science. The Institute has been founded by the universities of Cambridge, Edinburgh, Oxford, UCL and Warwick and EPSRC.

Fellowships are available for 3 years with the potential for an additional 2 years of support following interim review. Fellows will pursue research based at the Institute hub in the British Library, London. Fellowships will be awarded to individual candidates and fellows will be employed by a joint venture partner university (Cambridge, Edinburgh, Oxford, UCL or Warwick).

Key requirements: Successful candidates are expected to have i) a PhD in a data science (or adjacent) subject (or to have submitted their doctorate before taking up the post), ii) an excellent publication record and/or demonstrated excellent research potential such as via preprints, iii) a novel and challenging research agenda that will advance the strategic objectives of the Institute, and iv) leadership potential. Fellowships are open to all qualified applicants regardless of background.

Alan Turing Fellowship applications can be made in all data science research areas. The Institute’s research roadmap is available here. In addition to this open call, there are two specific fellowship programmes:

**Fellowships addressing data-centric engineering**

The Lloyd’s Register Foundation (LRF) / Alan Turing Institute programme to support data-centric engineering is a 5-year, £10M global programme, delivered through a partnership between LRF and the Alan Turing Institute. This programme will secure high technical standards (for example the next-generation algorithms and analytics) to enhance the safety of life and property around the major infrastructure upon which modern society relies. For further information on data-centric engineering, see LRF’s Foresight Review of Big Data. Applications for Fellowships under this call, which address the aims of the LRF/Turing programme, may also be considered for funding under the data-centric engineering programme. Fellowships awarded under this programme may vary from the conditions given above; for more details contact fellowship@turing.ac.uk.

**Fellowships addressing data analytics and high-performance computing**

Intel and the Alan Turing Institute will be supporting additional Fellowships in data analytics and high-performance computing. Applications for Fellowships under this call may also be considered for funding under the joint Intel-Alan Turing Institute programme. Fellowships awarded under this joint programme may vary from the conditions given above; for more details contact fellowship@turing.ac.uk.

Download full information on the Turing fellowships here

Diversity and equality are promoted in all aspects of the recruitment and career management of our researchers. In keeping with the principles of the Institute, we especially encourage applications from female researchers

Filed under: pictures, Statistics, University life Tagged: academic position, Alan Turing, Alan Turing Institute, ATI fellowships, big data, British Library, data analysis, fellowships, Intel, Lloyd, London, United Kingdom, University of Warwick

### Paret’oothed importance sampling and infinite variance [guest post]

*[Here are some comments sent to me by Aki Vehtari in the sequel of the previous posts.]*

The following is mostly based on our arXived paper with Andrew Gelman and the references mentioned there.

Koopman, Shephard, and Creal (2009) proposed to make a sample based estimate of the existence of the moments using generalized Pareto distribution fitted to the tail of the weight distribution. The number of existing moments is less than 1/k (when k>0), where k is the shape parameter of generalized Pareto distribution.

When k<1/2, the variance exists and the central limit theorem holds. Chen and Shao (2004) show further that the rate of convergence to normality is faster when higher moments exist. When 1/2≤k<1, the variance does not exist (but mean exists), the generalized central limit theorem holds, and we may assume the rate of convergence is faster when k is closer to 1/2.

In the example with “Exp(1) proposal for an Exp(1/2) target”, k=1/2 and we are truly on the border.

In our experiments in the arXived paper and in Vehtari, Gelman, and Gabry (2015), we have observed that Pareto smoothed importance sampling (PSIS) usually converges well also with k>1/2 but k close to 1/2 (let’s say k<0.7). But if k<1 and k is close to 1 (let’s say k>0.7) the convergence is much worse and both naïve importance sampling and PSIS are unreliable.

Two figures are attached, which show the results comparing IS and PSIS in the Exp(1/2) and Exp(1/10) examples. The results were computed with repeating 1000 times a simulation with 10000 samples in each. We can see the bad performance of IS in both examples as you also illustrated. In Exp(1/2) case, PSIS is also to produce much more stable results. In Exp(1/10) case, PSIS is able to reduce the variance of the estimate, but it is not enough to avoid a big bias.

It would be interesting to have more theoretical justification why infinite variance is not so big problem if k is close to 1/2 (e.g. how the convergence rate is related to the amount of fractional moments).

I guess that max ω[t] / ∑ ω[t] in Chaterjee and Diaconis has some connection to the tail shape parameter of the generalized Pareto distribution, but it is likely to be much noisier as it depends on the maximum value instead of a larger number of tail samples as in the approach by Koopman, Shephard, and Creal (2009).A third figure shows an example where the variance is finite, with “an Exp(1) proposal for an Exp(1/1.9) target”, which corresponds to k≈0.475 < 1/2. Although the variance is finite, we are close to the border and the performance of basic IS is bad. There is no sharp change in the practical behaviour with a finite number of draws when going from finite variance to infinite variance. Thus, I think it is not enough to focus on the discrete number of moments, but for example, the Pareto shape parameter k gives us more information. Koopman, Shephard, and Creal (2009) also estimated the Pareto shape k, but they formed a hypothesis test whether the variance is finite and thus discretising the information in k, and assuming that finite variance is enough to get good performance.

Filed under: Kids, pictures, R, Statistics, University life Tagged: central limit theorem, generalised Pareto distribution, importance sampling, infinite variance estimators, Pareto distribution, Pareto smoothed importance sampling, smoothing

### Horizon Maths 2015: Santé & Données

Filed under: pictures, Statistics, University life Tagged: big data, Fondation Sciences Mathématiques de Paris, France, health sciences, IBM, machine learning, medical imaging, medical statistics, Paris

### no country for odd means

**T**his morning, Clara Grazian and I arXived a paper about Jeffreys priors for mixtures. This is a part of Clara’s PhD dissertation between Roma and Paris, on which she has worked for the past year. Jeffreys priors cannot be computed analytically for mixtures, which is such a drag that it led us to devise the delayed acceptance algorithm. However, the main message from this detailed study of Jeffreys priors is that they mostly do not work for Gaussian mixture models, in that the posterior is almost invariably improper! This is a definite death knell for Jeffreys priors in this setting, meaning that alternative reference priors, like the one we advocated with Kerrie Mengersen and Mike Titterington, or the similar solution in Roeder and Wasserman, have to be used. *[Disclaimer: the title has little to do with the paper, except that posterior means are off for mixtures…]*

Filed under: Books, Kids, Statistics, University life Tagged: delayed acceptance, Gaussian mixture, Jeffreys priors, location-scale parameterisation, reference prior, Roma, Université Paris Dauphine

### New York solidaire [guest picture]

### résistance

*“Ami, entends-tu le vol noir des corbeaux sur la plaine?*

* Ami, entends-tu le bruit sourd du pays qu’on enchaîne?*

*Ohé partisans, ouvriers et paysans, c’est l’alarme!”*

*“Til shade is gone, *

*til water is gone*

*Into the shadow with teeth bared*

*Screaming defiance with the last breath*

*To spit in Death’s eye on the Last Day.”*

Filed under: Kids Tagged: France, Paris

### an afternoon at the Muséum National d’Histoire Naturelle

**A** sunny Sunday afternoon last week at the Grande Galerie de l’Évolution, with a special exhibit on great apes. We had not been back there for ages, since the kids were kids enough!, and it was a most pleasant afternoon, as was strolling in the vicinity afterwards. Made all the more pleasant in retrospect after the bloodbath last night in the streets of Paris. May we endure against barbarity.

Filed under: Kids, pictures, Travel Tagged: apes, France, Grande Galerie de l'Evolution, Jardin des Plantes, Jussieu, Muséum d'Histoire Naturelle, Paris

### Osiris under water

**O**n Sunday, we went to an exhibit at Institut du Monde Arabe, Paris, about the underwater remains of the submerged cities of Thônis-Héracléion and Canope, near Alexandria, Egypt. The cities have been explored by Institut Européen d’Archéologie Sous-Marine (IEASM) in the past decade, with amazing vestiges that helped reconstituting the religious mysteries of Osiris [hence the name of the exhibit].One of the first exhibits is this incredibly well-preserved stone, written in three languages (hieroglyphic, Demotic, and Greek), that could have been another Rosetta stone, had it not stayed under water till a few years ago.

An impressive statue of the goddess Tuat (or Taweret) that may have come from another source than the submerged cities. And which looks like a modern cartoon character…To conclude with this magnificent head of a priest (?) in a Greek style, as many pieces from the temples were from the Ptolemaic era.

Filed under: Kids, pictures, Travel Tagged: 64 kcal/day, Alexandria, Egypt, exhibit, Heraklion, Institut du Monde Arabe, Osiris, Paris, Ptolemaic dynasty, Rosetta stone, Tuat

### importance sampling with infinite variance

*“In this article it is shown that in a fairly general setting, a sample of size approximately exp(D(μ|ν)) is necessary and sufficient for accurate estimation by importance sampling.”*

**S**ourav Chatterjee and Persi Diaconis arXived yesterday an exciting paper where they study the proper sample size in an importance sampling setting with no variance. That’s right, *with no variance*. They give as a starting toy example the use of an Exp(1) proposal for an Exp(1/2) target, where the importance ratio exp(x/2)/2 has no ξ order moment (for ξ≥2). So the infinity in the variance is somehow borderline in this example, which may explain why the estimator could be considered to “work”. However, I disagree with the statement “that a sample size a few thousand suffices” for the estimator of the mean to be close to the true value, that is, 2. For instance, the picture I drew above is the superposition of 250 sequences of importance sampling estimators across 10⁵ iterations: several sequences show huge jumps, even for a large number of iterations, which are characteristic of infinite variance estimates. Thus, while the expected distance to the true value can be closely evaluated via the Kullback-Leibler divergence between the target and the proposal (which by the way is infinite when using a Normal as proposal and a Cauchy as target), there are realisations of the simulation path that can remain far from the true value and this for an arbitrary number of simulations. (I even wonder if, for a given simulation path, waiting long enough should not lead to those unbounded jumps.) The first result is frequentist, while the second is conditional, i.e., can occur for the single path we have just simulated… As I taught in class this very morning, I thus remain wary about using an infinite variance estimator. (And not only in connection with the harmonic mean quagmire. As shown below by the more extreme case of simulating an Exp(1) proposal for an Exp(1/10) target, where the mean is completely outside the range of estimates.) Wary, then, even though I find the enclosed result about the existence of a cut-off sample size associated with this L¹ measure quite astounding.

*“…the traditional approach of testing for convergence using the estimated variance of the importance sampling estimate has a flaw: for any given tolerance level ε, there is high probability that the test declares convergence at or before a sample size that depends only on ε and not on the two distributions f and g. This is absurd, since convergence may take arbitrarily long, depending on the problem”*

The second part of the paper starts from the remark that the empirical variance estimate is a poor choice of convergence criterion. Especially when the theoretical variance does not exist. As put by Art Owen in his importance sampling notes, “Badly skewed weights could give a badly estimated mean along with a bad variance estimate that masks the problem”. The authors suggest to consider instead a sort of inverse effective sample size derived from the importance weights and given by max ω[t] / ∑ ω[t], which should be small enough. However, replicating the above Exp(1) versus Exp(1/10) example, the (selected) picture below shows that this criterion may signal that all is fine just before storm hits!

The effective sample size has exactly the same issue in that a much lower value of the criterion may just be lurking around the corner. Which makes it hard to trust it at face value.

Filed under: pictures, R, Statistics, University life Tagged: arXiv, convergence diagnostics, cut-off, effective sample size, importance sampling, infinite variance estimators, Kullback-Leibler divergence, Persi Diaconis

### trimming poor importance samplers with Pareto scissors

**L**ast week A while ago, Aki Vehtari and Andrew Gelman arXived a paper on self-normalised importance sampling estimators, Pareto smoothed importance sampling. That I commented almost immediately and then sat on, waiting for the next version. Since the two A’s are still working on that revision, I eventually decided to post the comments, before a series of posts on the same issue. Disclaimer: the above quote from and picture of Pareto are unrelated with the paper!

A major drawback with importance samplers is that they can produce infinite variance estimators. Aki and Andrew compare in this study the behaviour of truncated importance weights, following a paper of Ionides (2008) Andrew and I had proposed as a student project last year, project that did not conclude. The truncation is of order √*S*, where *S* is the number of simulation, rescaled by the average weight (which should better be the median weight in the event of infinite variance weights). While this truncation leads to finite variance, it also induces a possibly far from negligible bias, bias that the paper suggests to reduce via a Pareto modelling of the largest or extreme weights. Three possible conclusions come from the Pareto modelling and the estimation of the Pareto shape k. If k<½, there is no variance issue and truncation is not necessary; if ½<k<1, the estimator has a mean but no variance, and if k>1, it does not even has a mean. The latter case sounds counter-intuitive since the self-normalised importance sampling estimator is the ratio of an estimate of a finite integral by an estimate of a positive constant… Aki and Andrew further use the Pareto estimation to smooth out the largest weights as estimated quantiles. They also eliminate the largest weights when k comes close to 1 or higher values.

On a normal toy example, simulated with too small a variance, the method is seen to reduce the variability if not the bias. In connection with my above remark, k does never appear as significantly above 1 in this example. A second toy example uses a shifted t distribution as proposal. This setting should not induce a infinite variance problem since the inverse of a t density remains integrable under a normal distribution, but the variance grows with the bias in the t proposal and the Pareto index k as well, exceeding the boundary value 1 in the end. Similar behaviour is observed on a multidimensional example.

The issue I have with this approach is the same I set to Andrew last year, namely why would one want to use a poor importance sampler and run the risk of ending up with a worthless approximation? Detecting infinite variance estimation is obviously an essential first step step to produce reliable approximation but a second step would to seek a substitute for the proposal in an automated manner, possibly by increasing the tails of the original one, or in running a reparameterisation of the original problem with the same proposal. Towards thinner tails of the target. Automated sounds unrealistic, obviously, but so does trusting an infinite variance estimate. If worse comes to worse, we should acknowledge and signal that the current sampler cannot be trusted. As in statistical settings, we should be able to state we cannot produce a satisfactory solution (and hence need more data or different models).

Filed under: Books, Statistics, University life Tagged: heavy-tail distribution, importance sampling, infinite variance estimators, Monte Carlo Statistical Methods, Pareto distribution, Vilfredo Pareto

### Le Monde puzzle [#937]

**A** combinatoric Le Monde mathematical puzzle that resembles many earlier ones:

*Given a pool of 30 interns allocated to three person night-shifts, is it possible to see 31 consecutive nights such that (a) all the shifts differ and (b) there are no pair of shifts with a single common intern?*

**I**n fact, the constraint there is very strong: two pairs of shift can only share zero or two interns. For one given shift, there are 26 other shifts with which it can share two interns, but then any two of those 26 others must share zero or two, which makes the two common to all and exclude any additional shift. But this is not the only approach to allocate the interns over the shifts since, as pointed out by Jean-Louis and checking with the following R code, 28 and not 27 is the maximum possible number of shifts under those conditions.

For instance, here is a 28 day solution:

> shft [,1] [,2] [,3] [1,] 14 30 29 [2,] 5 17 19 [3,] 2 18 24 [4,] 15 16 20 [5,] 4 22 28 [6,] 3 21 25 [7,] 13 21 25 [8,] 4 7 28 [9,] 1 17 19 [10,] 2 18 27 [11,] 10 15 20 [12,] 2 24 27 [13,] 8 9 23 [14,] 4 12 28 [15,] 1 5 17 [16,] 4 11 28 [17,] 6 14 29 [18,] 6 14 30 [19,] 3 13 25 [20,] 9 23 26 [21,] 1 5 19 [22,] 10 15 16 [23,] 8 9 26 [24,] 8 23 26 [25,] 3 13 21 [26,] 10 16 20 [27,] 18 24 27 [28,] 6 29 30Filed under: Books, R Tagged: arithmetics, intersect, Le Monde, mathematical puzzle, R