## Bayesian Bloggers

### ABC for big data

*“The results in this paper suggest that ABC can scale to large data, at least for models with a xed number of parameters, under the assumption that the summary statistics obey a central limit theorem.”*

**I**n a week rich with arXiv submissions about MCMC and “big data”, like the Variational consensus Monte Carlo of Rabinovich et al., or scalable Bayesian inference via particle mirror descent by Dai et al., Wentao Li and Paul Fearnhead contributed an impressive paper entitled Behaviour of ABC for big data. However, a word of warning: the title is somewhat misleading in that the paper does not address the issue of big or tall data *per se*, e.g., the impossibility to handle the whole data at once and to reproduce it by simulation, but rather the asymptotics of ABC. The setting is not dissimilar to the earlier Fearnhead and Prangle (2012) Read Paper. The central theme of this theoretical paper [with 24 pages of proofs!] is to study the connection between the number *N* of Monte Carlo simulations and the tolerance value *ε* when the number of observations *n* goes to infinity. A main result in the paper is that the ABC posterior mean can have the same asymptotic distribution as the MLE when *ε*=o(n-1/4). This is however *in opposition with *of no direct use in practice as the second main result that the Monte Carlo variance is well-controlled only when *ε*=O(n-1/2). There is therefore a sort of contradiction in the conclusion, between the positive equivalence with the MLE and

Something I have (slight) trouble with is the construction of an importance sampling function of the fABC(s|θ)α when, obviously, this function cannot be used for simulation purposes. The authors point out this fact, but still build an argument about the optimal choice of α, namely away from 0 and 1, like ½. Actually, any value different from 0,1, is sensible, meaning that the range of acceptable importance functions is wide. Most interestingly (!), the paper constructs an iterative importance sampling ABC in a spirit similar to Beaumont et al. (2009) ABC-PMC. Even more interestingly, the ½ factor amounts to updating the scale of the proposal as twice the scale of the target, just as in PMC.

Another aspect of the analysis I do not catch is the reason for keeping the Monte Carlo sample size to a fixed value N, while setting a sequence of acceptance probabilities (or of tolerances) along iterations. This is a very surprising result in that the Monte Carlo error does remain under control and does not dominate the overall error!

*“Whilst our theoretical results suggest that point estimates based on the ABC posterior have good properties, they do not suggest that the ABC posterior is a good approximation to the true posterior, nor that the ABC posterior will accurately quantify the uncertainty in estimates.”*

Overall, this is clearly a paper worth reading for understanding the convergence issues related with ABC. With more theoretical support than the earlier Fearnhead and Prangle (2012). However, it does not provide guidance into the construction of a sequence of Monte Carlo samples nor does it discuss the selection of the summary statistic, which has obviously a major impact on the efficiency of the estimation. And to relate to the earlier warning, it does not cope with “big data” in that it reproduces the original simulation of the n sized sample.

Filed under: Books, Statistics, University life Tagged: ABC, ABC-PMC, asymptotics, big data, iterated importance sampling, MCMC, particle system, simulation

### on Markov chain Monte Carlo methods for tall data

**R**émi Bardenet, Arnaud Doucet, and Chris Holmes arXived a long paper (with the above title) a month ago, paper that I did not have time to read in detail till today. The paper is quite comprehensive in its analysis of the current literature on MCMC for huge, tall, or big data. Even including our delayed acceptance paper! Now, it is indeed the case that we are all still struggling with this size difficulty. Making proposals in a wide range of directions, hopefully improving the efficiency of dealing with tall data. However, we are not there yet in that the outcome is either about as costly as the original MCMC implementation or its degree of approximation is unknown, even when bounds are available.

Most of the paper proposal is based on aiming at an unbiased estimator of the likelihood function in a pseudo-marginal manner à la Andrieu and Roberts (2009) and on a random subsampling scheme that presumes (a) iid-ness and (b) a lower bound on each term in the likelihood. It seems to me slightly unrealistic to assume that a much cheaper and tight lower bound on those terms could be available. Firmly set in the iid framework, the problem itself is unclear: do we need 10⁸ observations of a logistic model with a few parameters? The real challenge is rather in non-iid hierarchical models with random effects and complex dependence structures. For which subsampling gets much more delicate. None of the methods surveyed in the paper broaches upon such situations where the entire data cannot be explored at once.

An interesting experiment therein, based on the Glynn and Rhee (2014) unbiased representation, shows that the approach does not work well. This could lead the community to reconsider the focus on unbiasedness by coming full circle to the opposition between bias and variance. And between intractable likelihood and representative subsample likelihood.

Reading the (superb) coverage of earlier proposals made me trace back on the perceived appeal of the decomposition of Neiswanger et al. (2014) as I came to realise that the product of functions renormalised into densities has no immediate probabilistic connection with its components. As an extreme example, terms may fail to integrate. (Of course, there are many Monte Carlo features that exploit such a decomposition, from the pseudo-marginal to accept-reject algorithms. And more to come.) Taking samples from terms in the product is thus not directly related to taking samples from each term, in opposition with the arithmetic mixture representation. I was first convinced by using a fraction of the prior in each term but now find it unappealing because there is no reason the prior should change for a smaller sampler and no equivalent to the prohibition of using the data several times. At this stage, I would be much more in favour of raising a random portion of the likelihood function to the right power. An approach that I suggested to a graduate student earlier this year and which is also discussed in the paper. And considered too naïve and a “very poor approach” (Section 6, p.18), even though there must be versions that do not run afoul of the non-Gaussian nature of the log likelihood ratio. I am certainly going to peruse more thoroughly this Section 6 of the paper.

Another interesting suggestion in this definitely rich paper is the foray into an alternative bypassing the uniform sampling in the Metropolis-Hastings step, using instead the subsampled likelihood ratio. The authors call this “exchanging acceptance noise for subsampling noise” (p.22). However, there is no indication about the resulting stationary and I find the notion of *only* moving to higher likelihoods (or estimates of) counter to the spirit of Metropolis-Hastings algorithms. (I have also eventually realised the meaning of the log-normal “difficult” benchmark that I missed in the earlier : it means log-normal data is modelled by a normal density.) And yet another innovation along the lines of a control variate for the log likelihood ratio, no matter it sounds somewhat surrealistic.

Filed under: Books, Statistics, University life Tagged: big data, divide-and-conquer strategy, Metropolis-Hastings algorithm, parallel MCMC, subsampling, tall data

### La Rochambelle 2015 [10K – 38:28 – 73th & 4th V4]

Another year attending La Rochambelle, the massive women-only race or walk against breast cancer in Caen, Normandy! With the fantastic vision of 20,000 runners in the same pink tee-shirt swarming down-town Caen and the arrival stadium. Which made it quite hard to spot my three relatives in the race! I also ran my fourth iteration of the 10k the next day, from the British War Cemetery of Cambes-en-Plaine to the Memorial for Peace in Caen. The conditions were not as optimal as last year, especially in terms of wind, and I lost one minute on my total time, as well as one position, the third V2 remaining tantalisingly a dozen meters in front of me till the end of the race. A mix of too light trainings, travel fatigue and psychological conviction I was going to end up fourth! Here are my split times, with a very fast start that showed up in the second half near 4mn/km, when the third V2 passed me.

Filed under: Kids, pictures, Running, Travel Tagged: 10k, British War Cemetery, Caen, D Day, D-Day beaches, La Rochambelle, Les Courants de la Liberté, Memorial for Peace, Normandy, veteran (V2)

### Altos de Losada [guest wine post by Susie]

[Here is a wine criticism written by Susie Bayarri in 2013 about a 2008 bottle of Altos de Losada, a wine from Leon:]

**T**he cork is fantastic. Very good presentation and labelling of the bottle. The wine color is like dark cherry, I would almost say of the color of blood. Very bright although unfiltered. The cover is d16efinitely high. The tear is very nice (at least in my glass), slow, wide, through parallel streams… but it does not dye my glass at all.

The bouquet is its best feature… it is simply voluptuous… with ripe plums as well as vanilla, some mineral tone plus a smoky hint. I cannot quite detect which wood is used… I have always loved the bouquet of this wine…

In mouth, it remains a bit closed. Next time, I will make sure I decant it (or I will use that Venturi device) but it is nonetheless excellent… the wine is truly fruity, but complex as well (nothing like grape juice). The tannins are definitely present, but tamed and assimilated (I think they will continue to mellow) and it has just a hint of acidity… Despite its alcohol content, it remains light, neither overly sweet nor heavy. The after-taste offers a pleasant bitterness… It is just delicious, an awesome wine!

Filed under: pictures, Travel, University life, Wines Tagged: Altos de Losada, Leon, Spanish wines, Susie Bayarri, València, wine tasting

### Current trends in Bayesian methodology with applications

**W**hen putting this volume together with Umesh Singh, Dipak Dey, and Appaia Loganathan, my friend Satyanshu Upadhyay from Varanasi, India, asked me for a foreword. The book is now out, with chapters written by a wide variety of Bayesians. And here is my foreword, for what it’s worth:

*It is a great pleasure to see a new book published on current aspects of Bayesian Analysis and coming out of India. This wide scope volume reflects very accurately on the present role of Bayesian Analysis in scientific inference, be it by statisticians, computer scientists or data analysts. Indeed, we have witnessed in the past decade a massive adoption of Bayesian techniques by users in need of statistical analyses, partly because it became easier to implement such techniques, partly because both the inclusion of prior beliefs and the production of a posterior distribution that provides a single filter for all inferential questions is a natural and intuitive way to process the latter. As reflected so nicely by the subtitle of Sharon McGrayne’s The Theory that Would not Die, the Bayesian approach to inference “cracked the Enigma code, hunted down Russian submarines” and more generally contributed to solve many real life or cognitive problems that did not seem to fit within the traditional patterns of a statistical model.*

* Two hundred and fifty years after Bayes published his note, the field is more diverse than ever, as reflected by the range of topics covered by this new book, from the foundations (with objective Bayes developments) to the implementation by filters and simulation devices, to the new Bayesian methodology (regression and small areas, non-ignorable response and factor analysis), to a fantastic array of applications. This display reflects very very well on the vitality and appeal of Bayesian Analysis. Furthermore, I note with great pleasure that the new book is edited by distinguished Indian Bayesians, India having always been a provider of fine and dedicated Bayesians. I thus warmly congratulate the editors for putting this exciting volume together and I offer my best wishes to readers about to appreciate the appeal and diversity of Bayesian Analysis.*

Filed under: Books, Statistics, Travel, University life Tagged: Bayesian Analysis, Bayesian statistics, book foreword, India, ISBA conference, Varanasi

### Objective Bayesian hypothesis testing

**O**ur paper with Diego Salmerón and Juan Cano using integral priors for binomial regression and objective Bayesian hypothesis testing (one of my topics of interest, see yesterday’s talk!) eventually appeared in Statistica Sinica. This is Volume 25, Number 3, of July 2015 and the table of contents shows an impressively diverse range of topics.

Filed under: Books, Statistics, University life Tagged: academic journals, binomial regression, integral priors, Objective Bayesian hypothesis testing, Statistica Sinica

### Bureau international des poids et mesures [bayésiennes?]

**T**he workshop at the BIPM on measurement uncertainty was certainly most exciting, first by its location in the Parc de Saint Cloud in classical buildings overlooking the Seine river in a most bucolic manner…and second by its mostly Bayesian flavour. The recommendations that the workshop addressed are about revisions in the current GUM, which stands for the Guide to the Expression of Uncertainty in Measurement. The discussion centred on using a more Bayesian approach than in the earlier version, with the organisers of the workshop and leaders of the revision apparently most in favour of that move. “Knowledge-based pdfs” came into the discussion as an attractive notion since it rings a Bayesian bell, especially when associated with probability as a degree of belief and incorporating the notion of an a priori probability distribution. And propagation of errors. Or even more when mentioning the removal of frequentist validations. What I gathered from the talks is the perspective drifting away from central limit approximations to more realistic representations, calling for Monte Carlo computations. There is also a lot I did not get about conventions, codes and standards. Including a short debate about the different meanings on Monte Carlo, from simulation technique to calculation method (as for confidence intervals). And another discussion about replacing the old formula for estimating sd from the Normal to the Student’s ** t **case. A change that remains highly debatable since the Student’s

**assumption is as shaky as the Normal one. What became clear [to me] during the meeting is that a rather heated debate is currently taking place about the need for a revision, with some members of the six (?) organisations involved arguing against Bayesian or linearisation tools.**

*t*This became even clearer during our frequentist versus Bayesian session with a first talk so outrageously anti-Bayesian it was hilarious! Among other things, the notion that “fixing” the data was against the principles of physics (the speaker was a physicist), that the only randomness in a Bayesian coin tossing was coming from the prior, that the likelihood function was a subjective construct, that the definition of the posterior density was a generalisation of Bayes’ theorem [generalisation found in… Bayes’ 1763 paper then!], that objective Bayes methods were inconsistent [because Jeffreys’ prior produces an inadmissible estimator of μ²!], that the move to Bayesian principles in GUM would cost the New Zealand economy 5 billion dollars [hopefully a frequentist estimate!], &tc., &tc. The second pro-frequentist speaker was by comparison much much more reasonable, although he insisted on showing Bayesian credible intervals do not achieve a nominal frequentist coverage, using a sort of fiducial argument distinguishing x=X+ε from X=x+ε that I missed… A lack of achievement that is fine by my standards. Indeed, a frequentist confidence interval provides a coverage guarantee either for a fixed parameter (in which case the Bayesian approach achieves better coverage by constant updating) or a varying parameter (in which case the frequency of proper inclusion is of no real interest!). The first Bayesian speaker was Tony O’Hagan, who summarily shred the first talk to shreds. And also criticised GUM2 for using reference priors and maxent priors. I am afraid my talk was a bit too exploratory for the audience (since I got absolutely no question!) In retrospect, I should have given an into to reference priors.

An interesting specificity of a workshop on metrology and measurement is that they are hard stickers to schedule, starting and finishing right on time. When a talk finished early, we waited until the intended time to the next talk. Not even allowing for extra discussion. When the only overtime and Belgian speaker ran close to 10 minutes late, I was afraid he would (deservedly) get lynched! He escaped unscathed, but may (and should) not get invited again..!

Filed under: pictures, Statistics, Travel Tagged: admissibility, Bayesian inference, Bureau international des poids et mesures, confidence intervals, conventions, France, frequentist inference, MaxEnt, norms, Paris, Pavillon de Breteuil, Sèvres, subjective versus objective Bayes, workshop

### philosophy at the 2015 Baccalauréat

*[Here is the pre-Bayesian quote from Hume that students had to analyse this year for the Baccalauréat:]*

* The maxim, by which we commonly conduct ourselves in our reasonings, is, that the objects, of which we have no experience, resembles those, of which we have; that what we have found to be most usual is always most probable; and that where there is an opposition of arguments, we ought to give the preference to such as are founded on the greatest number of past observations. But though, in proceeding by this rule, we readily reject any fact which is unusual and incredible in an ordinary degree; yet in advancing farther, the mind observes not always the same rule; but when anything is affirmed utterly absurd and miraculous, it rather the more readily admits of such a fact, upon account of that very circumstance, which ought to destroy all its authority. The passion of surprise and wonder, arising from miracles, being an agreeable emotion, gives a sensible tendency towards the belief of those events, from which it is derived.”* David Hume, *An Enquiry Concerning Human Understanding*,

Filed under: Books, Kids Tagged: Air France, An Enquiry Concerning Human Understanding, Baccalauréat, Bayesian foundations, David Hume, exam, finals, high school, miracles, philosophy, Scotland

### dynamic mixtures [at NBBC15]

**A** funny coincidence: as I was sitting next to Arnoldo Frigessi at the NBBC15 conference, I came upon a new question on Cross Validated about a dynamic mixture model he had developed in 2002 with Olga Haug and Håvård Rue [whom I also saw last week in Valencià]. The dynamic mixture model they proposed replaces the standard weights in the mixture with cumulative distribution functions, hence the term *dynamic*. Here is the version used in their paper (x>0)

where f is a Weibull density, g a generalised Pareto density, and w is the cdf of a Cauchy distribution [all distributions being endowed with standard parameters]. While the above object is *not* a mixture of a generalised Pareto and of a Weibull distributions (instead, it is a mixture of two non-standard distributions with unknown weights), it is close to the Weibull when x is near zero and ends up with the Pareto tail (when x is large). The question was about simulating from this distribution and, while an answer was in the paper, I replied on Cross Validated with an alternative accept-reject proposal and with a somewhat (if mildly) non-standard MCMC implementation enjoying a much higher acceptance rate and the same fit.

Filed under: R, Statistics Tagged: Arnoldo Frigessi, component of a mixture, cross validated, dynamic mixture, extremes, Havard Rue, NBBC15 conference, O-Bayes 2015, Pareto distribution, R, Reykjavik, Valencia conferences, Weibull distribution

### Paris Machine Learning Meeting #10 Season 2

**T**onight, I am invited to give a speed-presenting talk at the Paris Machine Learning last meeting of Season 2, with the themes of DL, Recovering Robots, Vowpal Wabbit, Predcsis, Matlab, and Bayesian test [by yours truly!] The meeting will take place in Jussieu, Amphi 25, Here are my slides for the meeting:

As it happened, the meeting was quite crowded with talks and plagued with technical difficulties in transmitting talks from Berlin and Toronto, so I came to talk about three hours after the beginning, which was less than optimal for the most technical presentation of the evening. I actually wonder if I even managed to carry the main idea of replacing Bayes factors with posteriors of the mixture weight! *[I had plenty of time to reflect upon this on my way back home as I had to wait for several and rare and crowded RER trains until one had enough room for me and my bike!]*

Filed under: Books, Kids, pictures, Statistics, University life Tagged: Berlin, data, Jussieu, machine learning, Matlab, Paris Machine Learning Applications group, RER B, robots, Toronto, Université Pierre et Marie Curie, Vowpal

### Statistics and Computing special issue on BNP

*[verbatim from the call for papers:]*

*Statistics and Computing* is preparing a special issue on Bayesian Nonparametrics, for publication by early 2016. We invite researchers to submit manuscripts for publication in the special issue. We expect that the focus theme will increase the visibility and impact of papers in the volume.

By making use of infinite-dimensional mathematical structures, Bayesian nonparametric statistics allows the complexity of a learned model to grow as the size of a data set grows. This flexibility can be particularly suited to modern data sets but can also present a number of computational and *modelling* challenges. In this special issue, we will showcase novel applications of Bayesian nonparametric models, new computational tools and algorithms for learning these models, and new models for the diverse structures and relations that may be present in data.

To submit to the special issue, please use the Statistics and Computing online submission system. To indicate consideration for the special issue, choose “Special Issue: Bayesian Nonparametrics” as the article type. Papers must be prepared in accordance with the *Statistics and Computing* journal guidelines.

Papers will go through the usual peer review process. The special issue website will be updated with any relevant deadlines and information.

Deadline for manuscript submission: August 20, 2015

Guest editors:Tamara Broderick (MIT)

Katherine Heller (Duke)

Peter Mueller (UT Austin)

Filed under: Books, Statistics, University life Tagged: algorithms, Bayesian nonparametric, call for papers, machine learning, modelling, nonparametric statistics, special issue, Statistics and Computing

### a unified treatment of predictive model comparison

*“Applying various approximation strategies to the relative predictive performance derived **from predictive distributions in frequentist and Bayesian inference yields many of the model **comparison techniques ubiquitous in practice, from predictive log loss cross validation to **the Bayesian evidence and Bayesian information criteria.”*

**M**ichael Betancourt (Warwick) just arXived a paper formalising predictive model comparison in an almost Bourbakian sense! Meaning that he adopts therein a very general representation of the issue, with minimal assumptions on the data generating process (excluding a specific metric and obviously the choice of a testing statistic). He opts for an M-open perspective, meaning that this generating process stands outside the hypothetical statistical model or, in Lindley’s terms, a small world. Within this paradigm, the only way to assess the fit of a model seems to be through the predictive performances of that model. Using for instance an f-divergence like the Kullback-Leibler divergence, based on the true generated process as the reference. I think this however puts a restriction on the choice of small worlds as the probability measure on that small world has to be absolutely continuous wrt the true data generating process for the distance to be finite. While there are arguments in favour of absolutely continuous small worlds, this assumes a knowledge about the true process that we simply cannot gather. Ignoring this difficulty, a relative Kullback-Leibler divergence can be defined in terms of an almost arbitrary reference measure. But as it still relies on the true measure, its evaluation proceeds via cross-validation “tricks” like jackknife and bootstrap. However, on the Bayesian side, using the prior predictive links the Kullback-Leibler divergence with the marginal likelihood. And Michael argues further that the posterior predictive can be seen as the unifying tool behind information criteria like DIC and WAIC (widely applicable information criterion). Which does not convince me towards the utility of those criteria as model selection tools, as there is too much freedom in the way approximations are used and a potential for using the data several times.

Filed under: Books, Statistics, University life Tagged: AIC, Bayesian model comparison, Bayesian predictive, Bourbaki, DIC, Kullback-Leibler divergence, M-open inference, marginal likelihood, posterior predictive, small worlds

### midnight sun [minus 50mn]

Filed under: pictures, Running, Travel Tagged: driving picture, Iceland, Keflavik, moonlight, moors, sunset, Twilight

### Bureau international des poids et mesures

**T**oday, I am taking part in a meeting in Paris, for an exotic change!, at the Bureau international des poids et mesures (BIPM), which looks after a universal reference for measurements. For instance, here is its definition of the kilogram:

* The unit of mass, the kilogram, is the mass of the international prototype of the kilogram kept in air under three bell jars at the BIPM. It is a cylinder made of an alloy for which the mass fraction of platinum is 90 % and the mass fraction of iridium is 10 %.*

And the BIPM is thus interested in the uncertainty associated with such measurements. Hence the workshop on measurement uncertainties. Tony O’Hagan will also be giving a talk in a session that opposes frequentist and Bayesian approaches, even though I decided to introduce ABC as it seems to me to be a natural notion for measurement problems (as far as I can tell from my prior on measurement problems).

Filed under: Books, Statistics, University life Tagged: ABC, ABC model choice, Bayesian model choice, BIPM, kilogram, measurement, Paris, random forests, Sèvres, Tony O'Hagan, uncertainty

### Boring blades [book review]

**T**his fifth volume of the “Blades” fantasy series by Kelly McCullough is entitled Drawn blades but it gives the impression the author has exhausted what she can seriously drag from the universe she created a few volumes ago. Even when resuscitating another former lover of the main character. And moving to an unknown part of the world. And bringing in new super-species, cultists, and even a petty god. Yes, a petty god, whining and poorly lying, And an anti-sect police. And a fantasy version of the surfing board. Yes again, a surfing board. Inland. Despite all those unusual features, the book feels like a sluggish copy of a million fantasy books that have mixed the themes of an awakening god awaited by fanatics followers in unlimited subterranean vaults, with the heroes eventually getting the better of the dumb followers and even of the (dumb) god. And boring a grumpy reader to sleep every single evening. The next instalment in the series, Darkened blade, just appeared, but I do not think I will return to Aral’s world again. The earlier volumes were quite enjoyable and recommended. Now comes a time to end the series!

Filed under: Books, Mountains Tagged: Ancient Blades, Bared Blade, heroic fantasy, Kelly McCullough, sword and sorcery

### desperately seeking puffins!

**O**n Sunday afternoon, I made a brief trip to the southern coast of the Reykjanes Peninsula in an attempt to watch puffins. According to my guide book, the cliffs at Krýsuvíkurberg were populated with many species of birdlife, including the elusive puffin. However, I could only spot gulls, and more gulls, as I walked a few kilometres along those cliffs and away from the occasional 4WD stopping by the end of a dirt road [my small rental car could not handle that far]. When I was about to turn back, I spotted different birds on a small rock promontory, too far for me to tell the species, and as I was zooming at them, a puffin flew by!, so small that I almost missed it. I tried to see if any other was dwelling in the cliffs left and right but to no avail. A few minutes later, presumably the same puffin flew back and this was the end of it. Even after looking at the enlarged picture, I cannot tell what those “other” birds are: presumably Brünnich’s guillemots…

Filed under: Kids, Mountains, pictures, Travel Tagged: bird watching, Brünnich's guillemot, dirt road, gulls, Iceland, Krýsuvíkurberg, puffins, Reykjanes Peninsula

### Icelandic landscape [#3]

Filed under: Mountains, pictures, Travel Tagged: fumerolles, geothermal ponds, Iceland, Kleifarvatn, Seltún

### Le Monde puzzle [#913]

**A**n arithmetics Le Monde mathematical puzzle:

*Find all bi-twin **integers, namely positive** integers such that adding 2 to any of their dividers returns a prime number**.
*

An easy puzzle, once the R libraries on prime number decomposition can be found!, since it is straightforward to check for solutions. Unfortunately, I could not install the recent numbers package. So I used instead the schoolmath R package. Despite its possible bugs. But it seems to do the job for this problem:

lem=NULL for (t in 1:1e4) if (prod(is.prim(prime.factor(t)+2)==1)) lem=c(lem,t)digin=function(n){which returned all solutions, albeit in a lengthy fashion:

> lem [1] 1 3 5 9 11 15 17 25 27 29 33 41 45 51 55 [16] 59 71 75 81 85 87 99 101 107 121 123 125 135 137 145 [31] 149 153 165 177 179 187 191 197 205 213 225 227 239 243 255 [46] 261 269 275 281 289 295 297 303 311 319 321 347 355 363 369 [61] 375 405 411 419 425 431 435 447 451 459 461 493 495 505 521 [76] 531 535 537 561 569 573 591 599 605 615 617 625 639 641 649 [91] 659 675 681 685 697 717 725 729 745 765 781 783 807 809 821 [106] 825 827 841 843 857 867 881 885 891 895 909 933 935 955 957 [121] 963 985 1003 1019 1025 1031 ...Filed under: Books, Kids, Statistics, University life Tagged: Le Monde, mathematical puzzle, numbers, packages, prime number, R, schoolmath, twin prime numbers

### capture mark recapture with no mark and no recapture [aka 23andmyfish]

**A** very exciting talk today at NBBC15 here in Reykjavik was delivered by Mark Bravington yesterday on *Close-kin mark recapture by modern magic* (!). Although Mark is from Australia, being a Hobart resident does qualify him for the Nordic branch of the conference! The exciting idea is to use genetic markers to link catches in a (fish) population as being related as parent-offspring or as siblings. This sounds like science-fantasy when you first hear of it!, but it is actually working better than standard capture-mark-recapture methods for populations of a certain size (so that the chances to find related animals are not the absolute zero!, as, e.g., krill populations). The talk was focussed on bluefin tuna, whose survival is unlikely under the current fishing pressure… Among the advantages, a much more limited impact of the capture on the animal, since only a small amount of genetic material is needed, no tag loss, tag destruction by hunters, or tag impact of the animal survival, no recapture, a unique identification of each animal, and the potential for a detailed amount of information through the genetic record. Ideally, the entire sample could lead to a reconstruction of its genealogy all the way to the common ancestor, a wee bit like what 23andme proposes for humans, but this remains at the science-fantasy level given what is currently know about the fish species genomes.

Filed under: Mountains, pictures, Statistics, Travel, University life Tagged: Australia, bluefin tuna, capture-recapture, genotyping, Hobart, Iceland, NBBC15 conference, Reykjavik, Tasmania