Xian's Og

Syndicate content Xi'an's Og
an attempt at bloggin, nothing more...
Updated: 6 weeks 12 hours ago

repulsive mixtures

Sun, 2017-04-09 18:17

Fangzheng Xie and Yanxun Xu arXived today a paper on Bayesian repulsive modelling for mixtures. Not that Bayesian modelling is repulsive in any psychological sense, but rather that the components of the mixture are repulsive one against another. The device towards this repulsiveness is to add a penalty term to the original prior such that close means are penalised. (In the spirit of the sugar loaf with water drops represented on the cover of Bayesian Choice that we used in our pinball sampler, repulsiveness being there on the particles of a simulated sample and not on components.) Which means a prior assumption that close covariance matrices are of lesser importance. An interrogation I have has is was why empty components are not excluded as well, but this does not make too much sense in the Dirichlet process formulation of the current paper. And in the finite mixture version the Dirichlet prior on the weights has coefficients less than one.

The paper establishes consistency results for such repulsive priors, both for estimating the distribution itself and the number of components, K, under a collection of assumptions on the distribution, prior, and repulsiveness factors. While I have no mathematical issue with such results, I always wonder at their relevance for a given finite sample from a finite mixture in that they give an impression that the number of components is a perfectly estimable quantity, which it is not (in my opinion!) because of the fluid nature of mixture components and therefore the inevitable impact of prior modelling. (As Larry Wasserman would pound in, mixtures like tequila are evil and should likewise be avoided!)

The implementation of this modelling goes through a “block-collapsed” Gibbs sampler that exploits the latent variable representation (as in our early mixture paper with Jean Diebolt). Which includes the Old Faithful data as an illustration (for which a submission of ours was recently rejected for using too old datasets). And use the logarithm of the conditional predictive ordinate as  an assessment tool, which is a posterior predictive estimated by MCMC, using the data a second time for the fit.

Filed under: Books, Statistics Tagged: consistency, Dirichlet mixture priors, finite mixtures, Gibbs sampling, Larry Wasserman, repulsiveness, reversible jump MCMC, tequila, unknown number of components
Categories: Bayesian Bloggers

challenged books

Sat, 2017-04-08 18:17

After reading that Margaret Atwood’s The Handmaid’s Tale was one of the most challenged books in the USA, where challenged means “documented requests to remove materials from school or libraries”, I went to check on the website of the American Library Association for other titles, and found that The Curious Incident of the Dog in the Nigh-time and the Bible made it to the top 10 in 2015, with Of Mice and Men, Harry Potter, The Adventures of Huckleberry Finn, Brave New World, Hunger Games, Slaughterhouse Five, Cal, several of Roald Dahl’s and of Toni Morrisson’s books, Persepolis, and Tintin in America [and numerous others] appearing in the list… (As read in several comments, it is quite a surprise Shakespeare is not part of it!)

What is most frightening about those challenges and calls for censorship is that a growing portion of the reasons given against the books is “diversity“, namely that they propose a different view point, were it religious (or atheist), gender-related, ethnic, political, or disability-related.

Filed under: Books, Kids Tagged: American Library Association, banned books, Brave New World, censorship, Harry Potter, John Steinbeck, Persepolis, The Handmaid's Tale, Tintin, USA
Categories: Bayesian Bloggers

Shadows of Self [book review]

Fri, 2017-04-07 18:17

“He’d always found it odd that so many died when they were old, as logic said that was the point in their lives when they’d the most practice not dying.”

Now this is steampunk fantasy, definitely! With little novelty in the setting of the universe. If mixed with a Wild West feeling, though, just like the half-made World

“Mirabell had been a statistician and psychologist in the third century who had studied why some people worked harder than others.”

Actually, this is the same universe as The Mistborn trilogy, but 300 years later,which allows for some self-referential jokes and satire. Including the notion that the current ruling class could be exactly what the heroes of The Mistborn had fought against!

“Not guns,” Wayne said with a grin. “A different kind of weapon. Math.”

More precisely, this is the (a?) sequel to the Alloy of Law, which I had almost completely forgotten, unlike The Mistborn trilogy, which does not help with the reading as the book refers rather insistently to this Alloy of Law!

“Sir, you said you hired me in part because of my ability to read statistics.”

Nonetheless, it is an interesting plot, with a very nice ambiguity of the main characters, who (again) often feel they may be closer to the dictature that set The Mistborn revolution than to the revolutionaries themselves! And one of the heroes is a statistician (as obvious from the many quotes around!).

“Wayne felt a disturbance stir within him, like his stomach discovering  he’d just fed it a bunch of rotten apples. Religion worried him. It could ask men to do things they’d otherwise never do.”

In short, good story, nice style, entertaining dialogues: perfect [mind-candy] travel novel!

Filed under: Books, Kids, Statistics, Travel Tagged: alloy, Brandon Sanderson, candy, Mistborn, Shadows of Self, Statistics, steampunk, Wild West
Categories: Bayesian Bloggers

and it only gets worse…

Fri, 2017-04-07 08:18

“The State Department said on Monday it was ending U.S. funding for the United Nations Population Fund, the international body’s agency focused on family planning as well as maternal and child health in more than 150 countries.” Reuters, April 3, 2017

“When it comes to science, there are few winners in US President Donald Trump’s first budget proposal. The plan, released on 16 March, calls for double-digit cuts for the Environmental Protection Agency (EPA) and the National Institutes of Health (NIH). It also lays the foundation for a broad shift in the United States’ research priorities, including a retreat from environmental and climate programmes.” Nature, March 16, 2017

“In light of the recent executive order on visas and immigration, we are compelled to speak out in support of our international members. Science benefits from the free expression and exchange of ideas. As the oldest scientific society in the United States, and the world’s largest professional society for statisticians, the ASA has an overarching responsibility to support rigorous and robust science. Our world relies on data and statistical thinking to drive discovery, which thrives from the contributions of a global community of scientists, researchers, and students. A flourishing scientific culture, in turn, benefits our nation’s economic prosperity and security. ​” ASA, March, 2017

Filed under: Kids, pictures, Travel Tagged: American Statistical Association, climate change, Donald Trump, Environmental Protection Agency, global warming, Human Rights, National Institutes of Health, The New York Times, trumpism, United Nations Population Fund, US politics

Categories: Bayesian Bloggers

Bayesian program synthesis

Thu, 2017-04-06 18:17

Last week, I—along with Jean-Michel Marin—got an email from a journalist working for Science & Vie, a French sciences journal that published a few years ago a special issue on Bayes’ theorem. (With the insane title of “the formula that deciphers the World!”) The reason for this call was the preparation of a paper on Gamalon, a new AI company that relies on (Bayesian) probabilistic programming to devise predictive tools. And spent an hour skyping with him about Bayesian inference, probabilistic programming and machine-learning, at the general level since we had not heard previously of this company or of its central tool.

“the Gamalon BPS system learns from only a few examples, not millions. It can learn using a tablet processor, not hundreds of servers. It learns right away while we play with it, not over weeks or months. And it learns from just one person, not from thousands.”

Gamalon claims to do much better than deep learning at those tasks. Not that I have reasons to doubt that claim, quite the opposite, an obvious reason being that incorporating rules and probabilistic models in the predictor is going to help if these rule and models are even moderately realistic, another major one being that handling uncertainty and learning by Bayesian tools is usually a good idea (!), and yet another significant one being that David Blei is a member of their advisory committee. But it is hard to get a feeling for such claims when the only element in the open is the use of probabilistic programming, which is an advanced and efficient manner of conducting model building and updating and handling (posterior) distributions as objects, but which does not enjoy higher predictives abilities by default. Unless I live with a restricted definition of what probabilistic programming stands for! In any case, the video provided by Gamalon and the presentation given by its CEO do not help in my understanding of the principles behind this massive gain in efficiency. Which makes sense given that the company would not want to give up their edge on the competition.

Incidentally, the video in this presentation comparing the predictive abilities of the four major astronomical explanations of the solar system is great. If not particularly connected with the difference between deep learning and Bayesian probabilistic programming.

Filed under: Books, pictures, Statistics, University life Tagged: David Blei, deep learning, Gamelon, machine learning, neural network, principles of uncertainty, probabilistic programming, Science & Vie, solar system

Categories: Bayesian Bloggers

Statlearn17, Lyon

Thu, 2017-04-06 08:18

Today and tomorrow, I am attending the Statlearn17 conference in Lyon, France. Which is a workshop with one-hour talks on statistics and machine learning. And which makes for the second workshop on machine learning in two weeks! Yesterday there were two tutorials in R, but I only took the train to Lyon this morning: it will be a pleasant opportunity to run tomorrow through a city I have not truly ever visited, if X’ed so many times driving to the Alps. Interestingly, the trip started in Paris with me sitting in the train next to another speaker at the conference, despite having switched seat and carriage with another passenger! Speaker whom I did not know beforehand and could only identify him by his running R codes at 300km/h.

Filed under: Kids, pictures, R, Statistics, Travel, University life Tagged: Berlin, conference, France, French Alps, Lyon, machine learning, R, SFDS, Statlearn 2017, train, Université Lumière Lyon 2
Categories: Bayesian Bloggers

the incomprehensible challenge of poker

Wed, 2017-04-05 18:17

When reading in Nature about two deep learning algorithms winning at a version of poker within a few weeks of difference, I came back to my “usual” wonder about poker, as I cannot understand it as a game. (Although I can see the point, albeit dubious, in playing to win money.) And [definitely] correlatively do not understand the difficulty in building an AI that plays the game. [I know, I know nothing!]

Filed under: Statistics Tagged: /Pages/SIMAccueil.aspx, artificial intelligence, bills, deep learning, Denmark, game theory, Nature, poker, statistics and sports
Categories: Bayesian Bloggers

objective and subjective RSS Read Paper next week

Wed, 2017-04-05 08:18

Andrew Gelman and Christian Hennig will give a Read Paper presentation next Wednesday, April 12, 5pm, at the Royal Statistical Society, London, on their paper “Beyond subjective and objective in statistics“. Which I hope to attend and else to write a discussion. Since the discussion (to published in Series A) is open to everyone, I strongly encourage ‘Og’s readers to take a look at the paper and the “radical” views therein to hopefully contribute to this discussion. Either as a written discussion or as comments on this very post.

Filed under: Books, pictures, Statistics, Travel, University life, Wines Tagged: Andrew Gelman, Christian Hennig, discussion paper, England, frequentist inference, London, objective Bayes, objectivism, Philosophy of Science, Read paper, Royal Statistical Society, RSS, Series A, subjective versus objective Bayes, subjectivity
Categories: Bayesian Bloggers