Following the previous post on Rasmus’ socks, I took the opportunity of a survey on ABC I am currently completing to compare the outcome of his R code with my analytical derivation. After one quick correction [by Rasmus] of a wrong representation of the Negative Binomial mean-variance parametrisation [by me], I achieved this nice fit…
Filed under: Books, Kids, R, Statistics, University life Tagged: ABC, combinatorics, mean variance parametrisation, negative binomial distribution, socks
Here is the fourth set of slides for my third year statistics course, trying to build intuition about the likelihood surface and why on Earth would one want to find its maximum?!, through graphs. I am yet uncertain whether or not I will reach the point where I can teach more asymptotics so maybe I will also include asymptotic normality of the MLE under regularity conditions in this chapter…
Filed under: Books, Kids, Statistics, University life Tagged: asymptotics, Bayesian statistics, Don Rubin, EM algorithm, likelihood function, likelihood surface, missing values, Paris, score function, Université Paris Dauphine
When I first came went to the US in 1987, I switched from listening to the French public radio to listening to NPR, the National Public Radio network. However, it was not until I met both George Casella and Bernhard Flury that I started listening to “Car Talk”, the Sunday morning talk-show by the Magliozzi brothers where listeners would call and expose their car problem and get jokes and sometime advice in reply. Both George and Bernhard were big fans of the show, much more for the unbelievable high spirits it provided than for any deep interest in mechanics. And indeed there was something of the spirit of Zen and the art of motorcycle maintenance in that show, namely that through mechanical issues, people would come to expose deeper worries that the Magliozzi brothers would help bring out, playing the role of garage-shack psychiatrists…Which made me listen to them, despite my complete lack of interest in car, mechanics and repair in general.
One of George’s moments of fame was when he wrote to the Magliozzi brothers about Monty Hall’s problem, because they had botched their explanation as to why one should always change door. And they read it on the air, with the line “Who is this Casella guy from Cornell University? A professor? A janitor?” since George had just signed George Casella, Cornell University. Besides, Bernhard was such a fan of the show that he taped every single morning show, that he would later replay on long car trips (I do not know how his familly enjoyed the exposure to the show, though!). And so happened to have this line about George on tape, that he sent him a few weeks later… I am reminiscing all this because I saw in the NYT today that the older brother, Tom Magliozzi, had just died. Some engines can alas not be fixed… But I am sure there will be a queue of former car addicts in some heavenly place eager to ask him their question about their favourite car. Thanks for the ride, Tom!
Filed under: Books, Kids, Travel Tagged: Bernhard Flury, Car Talk, George Casella, NPR, The Monty Hall problem, The New York Times
In the past years, I have see a construction grow and grow under my office windows in Paris-Dauphine, ruining my views of the towers of La Défense, as seen on the above picture. This huge building designed by architect Frank Gehry has now opened as the Fondation Louis Vuitton museum, exposing artworks owned by LVMH and Bernard Arnault. Since I am very close to it and could not get an idea of what it looked like from my office, I took a Vélib yesterday and biked the kilometer between Porte Dauphine and the museum. As it had just opened, it was fairly crowded but I could still take a few pictures of this elegant sail-boat made of glass panels, without entering the art gallery itself…
Once the novelty has worn out and the crowds thinned down, I will be back to look at the exhibits. In the meanwhile, I for sure will not forget its presence…!
Filed under: pictures, Travel, University life Tagged: architecture, bois de Boulogne, Fondation Louis Vuiton, France, Frank Ghery, La Défense, modern art, museum, Paris, Université Paris Dauphine, Vélib
This newly arXived paper by S. Golchi and D. Campbell from Vancouver (hence the above picture) considers the (quite) interesting problem of simulating from a target distribution defined by a constraint. This is a question that have bothered me for a long while as I could not come up with a satisfactory solution all those years… Namely, when considering a hard constraint on a density, how can we find a sequence of targets that end up with the restricted density? This is of course connected with the zero measure case posted a few months ago. For instance, how do we efficiently simulate a sample from a Student’s t distribution with a fixed sample mean and a fixed sample variance?
“The key component of SMC is the filtering sequence of distributions through which the particles evolve towards the target distribution.” (p.3)
This is indeed the main issue! The paper considers using a sequence of intermediate targets hardening progressively the constraint(s), along with an SMC sampler, but this recommendation remains rather vague and hence I am at loss as to how to make it work when the exact constraint implies a change of measure. The first example is monotone regression where y has mean f(x) and f is monotone. (Everything is unidimensional here.) The sequence is then defined by adding a multiplicative term that is a function of ∂f/∂x, for instance
with τ growing to infinity to make the constraint moving from soft to hard. An interesting introduction, even though the hard constraint does not imply a change of parameter space or of measure. The second example is about estimating the parameters of an ODE, with the constraint being the ODE being satisfied exactly. Again, not exactly what I was looking for. But with an exotic application to deaths from the 1666 Black (Death) plague.
And then the third example is about ABC and the choice of summary statistics! The sequence of constraints is designed to keep observed and simulated summary statistics close enough when the dimension of those summaries increases, which means they are considered simultaneously rather than jointly. (In the sense of Ratmann et al., 2009. That is, with a multidimensional distance.) The model used for the application of the SMC is the dynamic model of Wood (2010, Nature). The outcome of this specific implementation is not that clear compared with alternatives… And again sadly does not deal with the/my zero measure issue.
Filed under: Books, Mountains, pictures, Statistics, University life Tagged: ABC, Bayesian model averaging, Black Death, Constrained Monte Carlo, Monte Carlo Statistical Methods, ODE, plague, sequential Monte Carlo, SMC, Student's t distribution, summary statistics
Another paper addressing the estimation of the normalising constant and the wealth of available solutions just came out on arXiv, with the full title of “Target density normalization for Markov chain Monte Carlo algorithms“, written by Allen Caldwell and Chang Liu. (I became aware of it by courtesy of Ewan Cameron, as it appeared in the physics section of arXiv. It is actually a wee bit annoying that papers in the subcategory “Data Analysis, Statistics and Probability” of physics do not get an automated reposting on the statistics lists…)
In this paper, the authors compare three approaches to the problem of finding
when the density f is unormalised, i.e., in more formal terms, when f is proportional to a probability density (and available):
- an “arithmetic mean”, which is an importance sampler based on (a) reducing the integration volume to a neighbourhood ω of the global mode. This neighbourhood is chosen as an hypercube and the importance function turns out to be the uniform over this hypercube. The corresponding estimator is then a rescaled version of the average of f over uniform simulations in ω.
- an “harmonic mean”, of all choices!, with again an integration over the neighbourhood ω of the global mode in order to avoid the almost sure infinite variance of harmonic mean estimators.
- a Laplace approximation, using the target at the mode and the Hessian at the mode as well.
The paper then goes to comparing those three solutions on a few examples, demonstrating how the diameter of the hypercube can be calibrated towards a minimum (estimated) uncertainty. The rather anticlimactic conclusion is that the arithmetic mean is the most reliable solution as harmonic means may fail in larger dimension and more importantly fail to signal its failure, while Laplace approximations only approximate well quasi-Gaussian densities…
What I find most interesting in this paper is the idea of using only one part of the integration space to compute the integral, even though it is not exactly new. Focussing on a specific region ω has pros and cons, the pros being that the reduction to a modal region reduces needs for absolute MCMC convergence and helps in selecting alternative proposals and also prevents from the worst consequences of using a dreaded harmonic mean, the cons being that the region needs be well-identified, which means requirements on the MCMC kernel, and that the estimate is a product of two estimates, the frequency being driven by a Binomial noise. I also like very much the idea of calibrating the diameter Δof the hypercube ex-post by estimating the uncertainty.
As an aside, the paper mentions most of the alternative solutions I just presented in my Monte Carlo graduate course two days ago (like nested or bridge or Rao-Blackwellised sampling, including our proposal with Darren Wraith), but dismisses them as not “directly applicable in an MCMC setting”, i.e., without modifying this setting. I unsurprisingly dispute this labelling, both because something like the Laplace approximation requires extra-work on the MCMC output (and once done this work can lead to advanced Laplace methods like INLA) and because other methods could be considered as well (for instance, bridge sampling over several hypercubes). As shown in the recent paper by Mathieu Gerber and Nicolas Chopin (soon to be discussed at the RSS!), MCqMC has also become a feasible alternative that would compete well with the methods studied in this paper.
Overall, this is a paper that comes in a long list of papers on constant approximations. I do not find the Markov chain of MCMC aspect particularly compelling or specific, once the effective sample size is accounted for. It would be nice to find generic ways of optimising the visit to the hypercube ω and to estimate efficiently the weight of ω. The comparison is solely run over examples, but they all rely on a proper characterisation of the hypercube and the ability to simulate efficiently f over that hypercube.
Filed under: Statistics, University life Tagged: harmonic mean estimator, importance sampling, Laplace approximation, MCMC, MCQMC, Monte Carlo Statistical Methods, normalising constant, Rao-Blackwellisation, untractable normalizing constant
Ten days ago, Gersende Fort, Benjamin Jourdain, Tony Lelièvre, and Gabriel Stoltz arXived a study about an adaptive umbrella sampler that can be re-interpreted as a Wang-Landau algorithm, if not the most efficient version of the latter. This reminded me very much of the workshop we had all together in Edinburgh last June. And even more of the focus of the molecular dynamics talks in this same ICMS workshop about accelerating the MCMC exploration of multimodal targets. The self-healing aspect of the sampler is to adapt to the multimodal structure thanks to a partition that defines a biased sampling scheme spending time in each set of the partition in a frequency proportional to weights. While the optimal weights are the weights of the sets against the target distribution (are they truly optimal?! I would have thought lifting low density regions, i.e., marshes, could improve the mixing of the chain for a given proposal), those are unknown and they need to be estimated by an adaptive scheme that makes staying in a given set the less desirable the more one has visited it. By increasing the inverse weight of a given set by a factor each time it is visited. Which sounds indeed like Wang-Landau. The plus side of the self-healing umbrella sampler is that it only depends on a scale γ (and on the partition). Besides converging to the right weights of course. The downside is that it does not reach the most efficient convergence, since the adaptivity weight decreases in 1/n rather than 1/√n.
Note that the paper contains a massive experimental side where the authors checked the impact of various parameters by Monte Carlo studies of estimators involving more than a billion iterations. Apparently repeated a large number of times.
The next step in adaptivity should be about the adaptive determination of the partition, hoping for a robustness against the dimension of the space. Which may be unreachable if I judge by the apparent deceleration of the method when the number of terms in the partition increases.
Filed under: Kids, pictures, Statistics, University life Tagged: acceleration of MCMC algorithms, adaptive MCMC methods, Monte Carlo experiment, multimodality, Tintin, umbrella sampling, Wang-Landau algorithm, well-tempered algorithm
There is an open call of the Fondation Sciences Mathématiques de Paris (FSMP) about a postdoctoral funding program with 18 position-years available for staying in Université Paris-Dauphine (and other participating universities). The net support is quite decent (wrt French terms and academic salaries) and the application form easy to fill. So, if you are interested in coming to Paris to work on ABC, MCMC, Bayesian model choice, &tc., feel free to contact me (or another Parisian statistician) and to apply! The deadline is December 01, 2014. And the decision will be made by January 15, 2015. The starting date for the postdoc is October 01, 2015.
Filed under: Kids, Statistics, Travel, University life Tagged: ABC, Bayesian model choice, Fondation Sciences Mathématiques de Paris, MCMC, Monte Carlo Statistical Methods, Paris, postdoctoral position, Université Paris Dauphine
I read this arXived paper of Goodman, Lin and Morzfeld on a (long) train trip to the North of Paris, paper that deals with asymptotics of importance sampling, but I still have trouble about its impact on statistical (Bayesian) problems. The paper is indeed mostly about asymptotics of symmetrized versions of importance sampling, but there is a noise parameter ε that does not make sense [to me] when I try to apply it to statistical problems…
The first symmetrization sounds like a special case of the generic group operator of Kong et al., 2003, in their Read Paper. Namely that if one takes advantage in potential symmetries in the importance distribution to multiply the number of proposed values and integrates this choice into a symmetrized importance distribution, hence using a sort of Rao-Blackwellisation, the effective sample size gets better… Beside symmetry, another transform of importance sampling proposals is rescaling so that the importance density at the proposal equates the target density at the rescaled version. (Those notions of symmetry and importance proposal require some specific knowledge about the mode and Hessian of the target.) This version is called a random map, even though I have trouble spotting the randomness in the transform. The third notion in this paper is the small noise version when a small ε is introduced along with the rescaled target
which is used to evaluate the efficiency of those different transforms as ε goes to zero. Symmetrized versions improve the effective sample size by a power of ε. Namely from ε to ε² . But I still fail to see why this matters in the original problem when ε=1.
Filed under: Books, Statistics, University life
Information about social entities is often spread across multiple large databases, each degraded by noise, and without unique identifiers shared across databases.Entity resolution—reconstructing the actual entities and their attributes—is essential to using big data and is challenging not only for inference but also for computation.
In this talk, I motivate entity resolution by the current conflict in Syria. It has been tremendously well documented, however, we still do not know how many people have been killed from conflict-related violence. We describe a novel approach towards estimating death counts in Syria and challenges that are unique to this database. We first introduce computational speed-ups to avoid all-to-all record comparisons based upon locality-sensitive hashing from the computer science literature. We then introduce a novel approach to entity resolution by discovering a bipartite graph, which links manifest records to a common set of latent entities. Our model quantifies the uncertainty in the inference and propagates this uncertainty into subsequent analyses. Finally, we speak to the success and challenges of solving a problem that is at the forefront of national headlines and news.
This is joint work with Rob Hall (Etsy), Steve Fienberg (CMU), and Anshu Shrivastava (Cornell University).
[Note that Rebecca will visit the maths department in Paris-Dauphine for two weeks and give a short course in our data science Master on data confidentiality, privacy and statistical disclosure (syllabus).]
Filed under: Books, Statistics, University life Tagged: Carnegie Mellon University, CEREMADE, course, data science, MASH, privacy, PSL, Rebecca Steorts, seminar, Syria, Université Paris Dauphine
“But is the existence of God just a philosophical question, like, say, the definition of knowledge or the existence of Plato’s forms?” Gary Gutting, NYT
Although I stopped following The Stone‘s interviews of philosophers about their views on religion, six more took place and Gary Gutting has now closed the series he started a while ago with a self-interview. On this occasion, I went quickly through the last interviews, which had the same variability in depth and appeal as the earlier ones. A lot of them were somewhat misplaced in trying to understand or justify the reasons for believing in a god (a.k.a., God), which sounds more appropriate for a psychology or sociology perspective. I presume that what I was expecting from the series was more a “science vs. religion” debate, rather than entries into the metaphysics of various religions…
“Marin Mersenne, Descartes’s close friend and sponsor, and an important mathematician and scientific thinker in his own right, claimed that there were over 50,000 atheists living in Paris in 1623. In his massive commentary on Genesis, where he advanced that claim, he offered 35 arguments for the existence of God. For Mersenne and his contemporaries, the idea of the atheist was terrifying.” Daniel Garber, NYT.
For instance, Daniel Garber quoted above discusses why he remains an atheist while being “convinced that [he] should want to believe” but can’t. That is, he cannot find convincing reasons to believe. And states that following Pascal’s wager argument would be self-deception. The whole thing sounds more like psychology than philosophy. [Incidentally correcting my long-going mistake of writing Mersenne as Meresme!]
“The existence of God is not just any philosophical issue. It’s intimately tied up with what very many in our society feel gives meaning to their lives (…) many are subject to often subtle, but also often powerful, pressure from their religious groups to feel and to act and even to try to be certain of their position. This no doubt creates special dangers. But it also seems that a life of religious faith can lead us to special values we can’t find elsewhere. At any rate, this too is a philosophical issue. In light of all that, I would not want to make any blanket pronouncement (…) that the most reasonable stance on the existence of God is to stay on the sidelines.” Keith DeRose, NYT
Another argument outside philosophy in that it invokes psychology at the individual level and sociology at the societal level. So this contradicts the statement it is a philosophical issue, doesn’t it? This interview of Keith DeRose mentions a quote from Kant: “I had to deny knowledge in order to make room for faith” that I find particularly relevant, as the existence or non existence of God cannot be considered as knowledge in the usual sense but a belief in the former case (with all sorts of causes, as discussed throughout most interviews) and a disbelief in the latter case, albeit supported by rational scepticism that there is not the slightest hint that a deity could exist. (This seems to be the whole point of the interview, which mostly conveys uncertainty and goes round in circles.)
“…karma is more like what Kant called a postulate of practical reason, something one does well to believe in and act according to (for Kant, belief in God was a practical postulate of this sort).” Jonardon Ganeri, NYT
Two more entries from the series are by religious philosophers, illustrating the difficulty of the exercise by engaging into a non-philosophical entry about their religion (Islam and Hinduism) and/or a comparison with other religions. Just like earlier for Buddhism, and the Jewish religion (if not Christianity, which anyway appears as a reference religion in most other interviews). A third interview adopts an unconvincing relativist approach to religion versus science, arguing that science cannot explain everything and that the fact that religion itself is a by-product of evolution does not give a reason for its dismissal. Michael Ruse however dismisses Dawkins’ arguments as “first-year undergraduate philosophy”, which I find a bit short an argument. Just like mentioning Nazis, as a supposedly definitive “argument from evil”.
Filed under: Books, Kids Tagged: atheism, Blaise Pascal, evolution, Mersenne, Philosophy of religions, psychology, religions, Richard Dawkins, sociology, The New York Times, The Stone
Filed under: Kids, pictures, Travel Tagged: Halloween, Massachusset, Roger Conant, Salem, statue, USA, vacations, witchery
A referee of our paper on approximating evidence for mixture model with Jeong Eun Lee pointed out the recent paper by Carlos Rodríguez and Stephen Walker on label switching in Bayesian mixture models: deterministic relabelling strategies. Which appeared this year in JCGS and went beyond, below or above my radar.
Label switching is an issue with mixture estimation (and other latent variable models) because mixture models are ill-posed models where part of the parameter is not identifiable. Indeed, the density of a mixture being a sum of terms
the parameter (vector) of the ω’s and of the θ’s is at best identifiable up to an arbitrary permutation of the components of the above sum. In other words, “component #1 of the mixture” is not a meaningful concept. And hence cannot be estimated.
This problem has been known for quite a while, much prior to EM and MCMC algorithms for mixtures, but it is only since mixtures have become truly estimable by Bayesian approaches that the debate has grown on this issue. In the very early days, Jean Diebolt and I proposed ordering the components in a unique way to give them a meaning. For instant, “component #1″ would then be the component with the smallest mean or the smallest weight and so on… Later, in one of my favourite X papers, with Gilles Celeux and Merrilee Hurn, we exposed the convergence issues related with the non-identifiability of mixture models, namely that the posterior distributions were almost always multimodal, with a multiple of k! symmetric modes in the case of exchangeable priors, and therefore that Markov chains would have trouble to visit all those modes in a symmetric manner, despite the symmetry being guaranteed from the shape of the posterior. And we conclude with the slightly provocative statement that hardly any Markov chain inferring about mixture models had ever converged! In parallel, time-wise, Matthew Stephens had completed a thesis at Oxford on the same topic and proposed solutions for relabelling MCMC simulations in order to identify a single mode and hence produce meaningful estimators. Giving another meaning to the notion of “component #1″.
And then the topic began to attract more and more researchers, being both simple to describe and frustrating in its lack of definitive answer, both from simulation and inference perspectives. Rodriguez’s and Walker’s paper provides a survey on the label switching strategies in the Bayesian processing of mixtures, but its innovative part is in deriving a relabelling strategy. Which consists of finding the optimal permutation (at each iteration of the Markov chain) by minimising a loss function inspired from k-means clustering. Which is connected with both Stephens’ and our [JASA, 2000] loss functions. The performances of this new version are shown to be roughly comparable with those of other relabelling strategies, in the case of Gaussian mixtures. (Making me wonder if the choice of the loss function is not favourable to Gaussian mixtures.) And somehow faster than Stephens’ Kullback-Leibler loss approach.
“Hence, in an MCMC algorithm, the indices of the parameters can permute multiple times between iterations. As a result, we cannot identify the hidden groups that make [all] ergodic averages to estimate characteristics of the components useless.”
One section of the paper puzzles me, albeit it does not impact the methodology and the conclusions. In Section 2.1 (p.27), the authors consider the quantity
which is the marginal probability of allocating observation i to cluster or component j. Under an exchangeable prior, this quantity is uniformly equal to 1/k for all observations i and all components j, by virtue of the invariance under permutation of the indices… So at best this can serve as a control variate. Later in Section 2.2 (p.28), the above sentence does signal a problem with those averages but it seem to attribute it to MCMC behaviour rather than to the invariance of the posterior (or to the non-identifiability of the components per se). At last, the paper mentions that “given the allocations, the likelihood is invariant under permutations of the parameters and the allocations” (p.28), which is not correct, since eqn. (8)
does not hold when the two permutations σ and τ give different images of zi…
Filed under: Books, Statistics, University life Tagged: component of a mixture, convergence, finite mixtures, identifiability, ill-posed problem, invariance, label switching, loss function, MCMC algorithms, missing data, multimodality, relabelling
Our paper about evaluating statistics used for ABC model choice has just appeared in Series B! It somewhat paradoxical that it comes out just a few days after we submitted our paper on using random forests for Bayesian model choice, thus bypassing the need for selecting those summary statistics by incorporating all statistics available and letting the trees automatically rank those statistics in term of their discriminating power. Nonetheless, this paper remains an exciting piece of work (!) as it addresses the more general and pressing question of the validity of running a Bayesian analysis with only part of the information contained in the data. Quite usefull in my (biased) opinion when considering the emergence of approximate inference already discussed on this ‘Og…
[As a trivial aside, I had first used fresh from the press(es) as the bracketted comment, before I realised the meaning was not necessarily the same in English and in French.]
Filed under: Books, Statistics, University life Tagged: ABC model choice, Approximate Bayesian computation, JRSSB, Royal Statistical Society, Series B, statistical methodology, summary statistics
An email from one of my Master students who sent his problem sheet (taken from Monte Carlo Statistical Methods) late:
Je « suis » votre cours du mercredi dont le formalisme mathématique me fait froid partout
Avec beaucoup de difficulté je vous envoie mes exercices du premier chapitre de votre livre.
which translates as
Good evening Professor,
I “follow” your Wednesday class which mathematical formalism makes me cold all over. With much hardship, I send you the first batch of problems from your book.
I know that winter is coming, but, still, making students shudder from mathematical cold is not my primary goal when teaching Monte Carlo methods!
Filed under: Books, Kids, Statistics, University life Tagged: computational statistics, ENSAE, Master program, MCMC algorithms, Monte Carlo Statistical Methods, statistical computing, Université Paris Dauphine, Winter is coming
After a somewhat prolonged labour (!), we have at last completed our paper on ABC model choice with random forests and submitted it to PNAS for possible publication. While the paper is entirely methodological, the primary domain of application of ABC model choice methods remains population genetics and the diffusion of this new methodology to the users is thus more likely via a media like PNAS than via a machine learning or statistics journal.
When compared with our recent update of the arXived paper, there is not much different in contents, as it is mostly an issue of fitting the PNAS publication canons. (Which makes the paper less readable in the posted version [in my opinion!] as it needs to fit the main document within the compulsory six pages, relegated part of the experiments and of the explanations to the Supplementary Information section.)
Filed under: pictures, R, Statistics, University life Tagged: 1000 Genomes Project, ABC, ABC model choice, machine learning, model posterior probabilities, posterior predictive, random forests, summary statistics
While I was in Warwick, Dan Simpson [newly arrived from Norway on a postdoc position] mentioned to me he had attended a talk by Aki Vehtari in Norway where my early work with Jérôme Dupuis on projective priors was used. He gave me the link to this paper by Peltola, Havulinna, Salomaa and Vehtari that indeed refers to the idea that a prior on a given Euclidean space defines priors by projections on all subspaces, despite the zero measure of all those subspaces. (This notion first appeared in a joint paper with my friend Costas Goutis, who alas died in a diving accident a few months later.) The projection further allowed for a simple expression of the Kullback-Leibler deviance between the corresponding models and for a Pythagorean theorem on the additivity of the deviances between embedded models. The weakest spot of this approach of ours was, in my opinion and unsurprisingly, about deciding when a submodel was too far from the full model. The lack of explanatory power introduced therein had no absolute scale and later discussions led me to think that the bound should depend on the sample size to ensure consistency. (The recent paper by Nott and Leng that was expanding on this projection has now appeared in CSDA.)
“Specifically, the models with subsets of covariates are found by maximizing the similarity of their predictions to this reference as proposed by Dupuis and Robert . Notably, this approach does not require specifying priors for the submodels and one can instead focus on building a good reference model. Dupuis and Robert (2003) suggest choosing the size of the covariate subset based on an acceptable loss of explanatory power compared to the reference model. We examine using cross-validation based estimates of predictive performance as an alternative.” T. Peltola et al.
The paper also connects with the Bayesian Lasso literature, concluding on the horseshoe prior being more informative than the Laplace prior. It applies the selection approach to identify biomarkers with predictive performances in a study of diabetic patients. The authors rank model according to their (log) predictive density at the observed data, using cross-validation to avoid exploiting the data twice. On the MCMC front, the paper implements the NUTS version of HMC with STAN.
Filed under: Mountains, pictures, Statistics, Travel, University life Tagged: Aki Vehtari, Bayesian lasso, Dan Simpson, embedded models, Hamiltonian Monte Carlo, horseshoe prior, Kullback-Leibler divergence, MCMC, Norway, NUTS, predictive power, prior projection, STAN, variable selection, zero measure set
After several clones of our SAME algorithm appeared in the literature, it is rather fun to see another paper acknowledging the connection. SAME but different was arXived today by Zhao, Jiang and Canny. The point of this short paper is to show that the parallel implementation of SAME leads to efficient performances compared with existing standards. Since the duplicated latent variables are independent [given θ] they can be simulated in parallel. They further assume independence between the components of those latent variables. And finite support. As in document analysis. So they can sample the replicated latent variables all at once. Parallelism is thus used solely for the components of the latent variable(s). SAME is normally associated with an annealing schedule but the authors could not detect an improvement over a fixed and large number of replications. They reported gains comparable to state-of-the-art variational Bayes on two large datasets. Quite fun to see SAME getting a new life thanks to computer scientists!
Filed under: Statistics, University life Tagged: data cloning, document analysis, map, Monte Carlo Statistical Methods, parallel MCMC, SAME, simulated annealing, simulation, stochastic optimisation, variational Bayes methods