## Bayesian News Feeds

### comments on reflections

**I** just arXived my comments about A. Ronald Gallant’s “Reflections on the Probability Space Induced by Moment Conditions with Implications for Bayesian Inference”, capitalising on the three posts I wrote around the discussion talk I gave at the 6th French Econometrics conference last year. Nothing new there, except that I may get a response from Ron Gallant as this is submitted as a discussion of his related paper in Journal of Financial Econometrics. While my conclusion is rather negative, I find the issue of setting prior and model based on a limited amount of information of much interest, with obvious links with ABC, empirical likelihood and other approximation methods.

Filed under: pictures, Statistics, University life Tagged: 6th French Econometrics conference, ABC, empirical likelihood, limited information inference, measure theory, moment prior, Ron Gallant

### Le Monde puzzle [#899]

**A**n arithmetics Le Monde mathematical puzzle:

*For which n’s are the averages of the first n squared **integers integers? Among those**, which ones are perfect squares?*

**A**n easy R code, for instance

which produces 333 values

[1] 1 5 7 11 13 17 19 23 25 29 31 35 37 41 43 47 49 53 [19] 55 59 61 65 67 71 73 77 79 83 85 89 91 95 97 101 103 107 [37] 109 113 115 119 121 125 127 131 133 137 139 143 145 149 151 155 157 161 [55] 163 167 169 173 175 179 181 185 187 191 193 197 199 203 205 209 211 215 [73] 217 221 223 227 229 233 235 239 241 245 247 251 253 257 259 263 265 269 [91] 271 275 277 281 283 287 289 293 295 299 301 305 307 311 313 317 319 323 [109] 325 329 331 335 337 341 343 347 349 353 355 359 361 365 367 371 373 377 [127] 379 383 385 389 391 395 397 401 403 407 409 413 415 419 421 425 427 431 [145] 433 437 439 443 445 449 451 455 457 461 463 467 469 473 475 479 481 485 [163] 487 491 493 497 499 503 505 509 511 515 517 521 523 527 529 533 535 539 [181] 541 545 547 551 553 557 559 563 565 569 571 575 577 581 583 587 589 593 [199] 595 599 601 605 607 611 613 617 619 623 625 629 631 635 637 641 643 647 [217] 649 653 655 659 661 665 667 671 673 677 679 683 685 689 691 695 697 701 [235] 703 707 709 713 715 719 721 725 727 731 733 737 739 743 745 749 751 755 [253] 757 761 763 767 769 773 775 779 781 785 787 791 793 797 799 803 805 809 [271] 811 815 817 821 823 827 829 833 835 839 841 845 847 851 853 857 859 863 [289] 865 869 871 875 877 881 883 887 889 893 895 899 901 905 907 911 913 917 [307] 919 923 925 929 931 935 937 941 943 947 949 953 955 959 961 965 967 971 [325] 973 977 979 983 985 989 991 995 997which are made of all odd integers that are not multiple of 3. (I could have guessed the exclusion of even numbers since the numerator is always odd. Why are the triplets excluded, now?! Jean-Louis Fouley gave me the answer: the sum of squares is such that

and hence m must be odd and 2m+1 a multiple of 3, which excludes multiples of 3.)

sole=sumcar[(1:n)[diff==0]] scar=as.integer(as.integer(sqrt(sole))^2)-sole sum(scar==0)with the final result

> sum(scar==0) [1] 2 > ((1:n)[diff==0])[scar==0] [1] 1 337since 38025=195² is a perfect square. (I wonder if there is a plain explanation for that result!)

Filed under: Books, Kids, Statistics, University life Tagged: arithmetics, Jean-Louis Fouley, Le Monde, mathematical puzzle, perfect square, R

### 41ièmes Foulées de Malakoff [5k, 7⁰C, 18:40, 40th & 2nd V2]

*[ Warning: post of limited interest to most, about a local race I ran for another year!]*

**O**nce more, I managed to run my annual 5k in Malakof. And once again being (barely) there on the day of the race. Having landed a few hours earlier from Birmingham. Due to traffic and road closures, I arrived very later in Malakoff and could not warm up as usual, or even squeeze to the first rows on the starting line. Given those handicaps, I still managed in getting close to my best time of last year (18:40 vs. 18:36). I alas finished second in my V2 category, just a few meters behind the first V2 and definitely catching up on him! My INSEE Paris Club team won the company challenge for yet another year. Repeating a pattern of now many years.

Filed under: Running Tagged: 5K, groundhog day, Insee Paris Club, Malakoff, veteran (V2)

### the ultimate argument

**I**n a tribune published on February 4 in Le Monde *[under the vote-fishing argument that the National Front is not a threat for democracy]*, the former minister [and convicted member of fascist groups in the 1960’s] Gérard Longuet wrote this unforgettable sentence about the former and current heads of the National Front:

*“Sa fille, elle, a compris, et d’ailleurs pourquoi serait-elle son père, alors que deux ou trois générations les séparent.” *

*[*Translation*: His daughter has for her part well understood and in any case why should she be her father when there are two or three generations between them.]*

Filed under: Statistics Tagged: French politics, Human Genetics, Le Monde, National Front

### Bayesian computation: fore and aft

**W**ith my friends Peter Green (Bristol), Krzysztof Łatuszyński (Warwick) and Marcello Pereyra (Bristol), we just arXived the first version of “Bayesian computation: a perspective on the current state, and sampling backwards and forwards”, which first title was the title of this post. This is a survey of our own perspective on Bayesian computation, from what occurred in the last 25 years [a lot!] to what could occur in the near future [a lot as well!]. Submitted to Statistics and Computing towards the special 25th anniversary issue, as announced in an earlier post.. Pulling strength and breadth from each other’s opinion, we have certainly attained more than the sum of our initial respective contributions, but we are welcoming comments about bits and pieces of importance that we miss and even more about promising new directions that are not posted in this survey. (A warning that is should go with most of my surveys is that my input in this paper will not differ by a large margin from ideas expressed here or in previous surveys.)

Filed under: Books, Statistics, University life Tagged: ABC, adaptive MCMC methods, Bayesian Analysis, Bayesian computation, Bayesian optimisation, expectation-propagation, MCMC algorithms, pseudo-marginal MCMC, Statistics and Computing, survey, University of Bristol, University of Warwick, variational Bayes methods

### relabelling mixtures (#2)

**F**ollowing the previous post, I went and had a (long) look at Puolamäki and Kaski’s paper. I must acknowledge that, despite having several runs through the paper, I still have trouble with the approach… From what I understand, the authors use a Bernoulli mixture pseudo-model to reallocate the observations to components. That is, given an MCMC output with simulated allocations variables (a.k.a., hidden or latent variables), they create a (*T*x*K*)x*n* matrix of component binary indicators e.g., for a three component mixture,

0 1 0 0 1 0…

1 0 0 0 0 0…

0 0 1 1 0 1…

0 1 0 0 1 1…

and estimate a probability to be in component *j* for each of the *n* observations, according to the (pseudo-)likelihood

It took me a few days, between morning runs and those wee hours when I cannot get back to sleep (!), to make some sense of this Bernoulli modelling. The allocation vectors are used *together* to estimate the probabilities of being “in” component j *together*. However the data—which is the outcome of an MCMC simulation and *de facto* does not originate from that Bernoulli mixture—does not seem appropriate, both because it is produced by an MCMC simulation and is made of blocks of highly correlated rows [which sum up to one]. The Bernoulli likelihood above also defines a new model, with many more parameters than in the original mixture model. And I fail to see why perfect, partial or inexistent label switching [in the MCMC sequence] is not going to impact the estimation of the Bernoulli mixture. And why an argument based on a fixed parameter value (Theorem 3) extends to an MCMC outcome where parameters themselves are subjected to some degree of label switching. Bemused, I remain…

Filed under: Statistics, Travel, University life Tagged: allocations, Bernoulli mixture, finite mixtures, label switching, MCMC algorithms, Monte Carlo Statistical Methods, permutations

### the latest Significance: Astrostats, black swans, and pregnant drivers [and zombies]

**R**eading Significance is always an enjoyable moment, when I can find time to skim through the articles (before my wife gets hold of it!). This time, I lost my copy between my office and home, and borrowed it from Tom Nichols at Warwick with four mornings to read it during breakfast. This December issue is definitely interesting, as it contains several introduction articles on astro- and cosmo-statistics! One thing I had not noticed before is how a large fraction of the papers is written by authors of books, giving a quick entry or interview about their book. For instance, I found out that Roberto Trotta had written a general public book called the Edge of the Sky (*All You Need to Know About the All-There-Is*) which exposes the fundamentals of cosmology through the 1000 most common words in the English Language.. So *Universe* is replaced with *All-There-Is*! I can understand and to some extent applaud the intention, but it nonetheless makes for a painful read, judging from the excerpt, when *researcher* and *telescope* are not part of the accepted vocabulary. Reading the corresponding article in Significance let me a bit bemused at the reason provided for the existence of a multiverse, i.e., of multiple replicas of our universe, all with different conditions: multiplying the universes makes our more likely, while it sounds almost impossible on its own! This sounds like a very *frequentist* argument… and I am not even certain it would convince a frequentist. The other articles in this special astrostatistics section were of a more statistical nature, from estimating the number of galaxies to the chances of a big asteroid impact. Even though I found the graphical representation of the meteorite impacts in the past century because of the impact drawing in the background. However, when I checked the link to Carlo Zapponi’s website, I found the picture was a still of a neat animation of meteorites falling since the first report.

*“Taleb himself, once described as a philosopher, now self-identifies as a statistician. And, intrinsically, anti-fragility and statistical thinking are interrelated.” T. Bendell*

Two rather superfluous [in my opinion] articles dealt with a regression of zombie google entries associated with each U.S. state—written by Daniel Zelterman, in connection with his chapter in the book Mathematical Modelling of Zombies), where I discovered the unexpected name of Mark Girolami [as a writer, not as a zombie cyclist!]—and something about X’mas crackers I have read further than the title. Yet another entry related with a book was Tony Bendell’s discussion of his recent book on Building anti-fragile organisations, written in the wake of Taleb’s book. Antifragile. (Reviewed by Larry Wasserman on the now defunct Normal Deviate.)

And I have not mentioned pregnant drivers yet: one entry was by two Canadian epidemiologists who studied the accident rate of pregnant women and concluded at an increased risk during pregnancy. I did not read the original paper so cannot make an informed comment, but still wonder at the possible impact of a higher tendency for pregnant women to be sent to hospital in case of a minor car accident. There could also be other confounding factors, like an increased mileage during pregnancy (certainly when compared with immediately after). And, since the study covers only women who completed their pregnancy and were still alive one year later, it excludes those who had severe or fatal crashes before starting a pregnancy or during their pregnancy. Another possible caveat is that, due to the rather limited length of the study, there may be an impact of the years of observation on the observed rise. This data is taken from Ontario, where Winter may be rather fierce!, and corrections for both seasonality and general number of crashes should have been considered.

Filed under: Books, Kids, pictures, Statistics, Travel, University life Tagged: antifragile, astrostatistics, bolides, cosmology, multiverse, Nassim Taleb, pregnancy, Roberto Trotta, Significance, University of Warwick, zombies

### my week in a Tudor farmhouse

**A**s I could not book my “usual” maths house on the campus of the University of Warwick, I searched for another accommodation and discovered a nice shared house in the countryside (next to my standard running route), run by the Warwick Institute of Advanced Study, and called Cryfield Grange. As seen from the pictures, the building itself is impressive, even though there is not much left inside of its Tudor foundations, except some unexpected steps in the middle of some rooms and a few remaining black beams; it is also quite enjoyable for a week visit, with a large kitchen where I made rice pudding and pissaladière for the whole week, and a bike path to the University. I will definitely try to get there in the summer, as it must be even more enjoyable!

Filed under: pictures, Running, Travel, University life Tagged: cooking, Cryfield Grange, England, pissaladière, Tudor, University of Warwick, visiting accomodation

### Moonset near Montpellier

Filed under: pictures, Travel Tagged: astronomy, Autour du Ciel, Guillaume Cannat, La Grande Motte, Montpellier, moonset

### minimaxity of a Bayes estimator

**T**oday, while in Warwick, I spotted on Cross Validated a question involving “minimax” in the title and hence could not help but look at it! The way I first understood the question (and immediately replied to it) was to check whether or not the standard Normal average—reduced to the single Normal observation by sufficiency considerations—is a minimax estimator of the normal mean under an interval zero-one loss defined by

where L is a positive tolerance bound. I had not seen this problem before, even though it sounds quite standard. In this setting, the identity estimator, i.e., the normal observation x, is indeed minimax as (a) it is a generalised Bayes estimator—Bayes estimators under this loss are given by the centre of an equal posterior interval—for this loss function under the constant prior and (b) it can be shown to be a limit of proper Bayes estimators and its Bayes risk is also the limit of the corresponding Bayes risks. (This is a most traditional way of establishing minimaxity for a generalised Bayes estimator.) However, this was not the question asked on the forum, as the book by Zacks it referred to stated that the standard Normal average maximised the minimal coverage, which amounts to the maximal risk under the above loss. With the strange inversion of parameter and estimator in the minimax risk:

which makes the first bound equal to 0 by equating estimator and mean μ. Note however that I cannot access the whole book and hence may miss some restriction or other subtlety that would explain for this unusual definition. (As an aside, note that Cross Validated has a protection against serial upvoting, So voting up or down at once a large chunk of my answers on that site does not impact my “reputation”!)

Filed under: Books, Kids, Statistics, University life Tagged: Bayes estimators, cross validated, generalised Bayes estimators, mathematical statistics, minimaxity, serial upvoting

### foxglove summer

**H**ere is the fifth instalment in the Peter Grant (or Rivers of London) series by Ben Aaronovitch. Thus entitled Foxglove summer, which meaning only became clear (to me) by the end of the book. I found it in my mailbox upon arrival in Warwick last Sunday. And rushed through the book during evenings, insomnia breaks and even a few breakfasts!

*“It’s observable but not reliably observable. It can have a quantifiable effects, but resists any attempt to apply mathematical principles to it – no wonder Newton kept magic under wraps. It must have driven him mental. Or maybe not.”* (p.297)

Either because the author has run out of ideas to centre a fifth novel on a part or aspect of London (even though the parks, including the London Zoo, were not particularly used in the previous novels), or because he could not set this new type of supernatural in a city (no spoilers!), this sequel takes place in the Western Counties, close to the Welsh border (and not so far from Brother Cadfael‘s Shrewbury!). It is also an opportunity to introduce brand new (local) characters which are enjoyable if a wee bit of a caricature! However, the inhabitants of the small village where the kidnapping investigation takes place are almost too sophisticated for Peter Grant who has to handle the enquiry all by himself, as his mentor is immobilised in London by the defection of Peter’s close colleague, Lindsey.

*“We trooped off (…) down something that was not so much a path as a statistical variation in the density of the overgrowth.”* (p.61)

As usual, the dialogues and monologues of Grant are the most enjoyable part of the story, along with a development of the long-in-the-coming love affair with the river goddess Beverley Brooks. And a much appreciated ambiguity in the attitude of Peter about the runaway Lindsey… The story itself reflects the limitations of a small village where one quickly repeats over and over the same trips and the same relations. Which gives a sensation of slow motion, even in the most exciting moments. The resolution of the enigma is borrowing too heavily to the fae and elves folklore, even though the final pages bring a few surprises. Nonetheless, the whole book was a page-turner for me, meaning I spent more time reading it this week than I intended or than was reasonable. No wonder for a series taking place in The Folly!

Filed under: Books, Kids, Travel Tagged: Ben Aaronnovitch, book review, England, foxglove, PC Peter Grant, Rivers of London, The Folly, unicorn, University of Warwick, Wales, Western Counties, Worcester

### icefalls on Ben Nevis

**T**he seminar invitation to Edinburgh gave me the opportunity and the excuse for a quick dash to Fort William for a day of ice-climbing on Ben Nevis. The ice conditions were perfect but there was alas too much snowdrift to attempt Point Five Gully, one of the mythical routes on the Ben. (Last time, the ice was not in good conditions.) Instead, we did three pitches on three different routes, one iced rock-face near the CIC hut, the first pitch of Waterfall Gully on Carn Dearg Buttress, and the first pitch of The Curtain, again on Carn Dearg Buttress.

The most difficult climb was the first one, grading about V.5 in Scottish grade, maybe above that as the ice was rather rotten, forcing my guide Ali to place many screws. And forcing me to unscrew them! Then the difficulty got much lower, except for the V.5 start of the Waterfall, where I had to climb with hands an ice pillar as the ice-picks would not get a good grip. Breaking another large pillar in the process, fortunately mostly avoiding being hit. The final climb was quite easy, more of a snow steep slope than a true ice-climb. Too bad the second part of the route was blocked by two fellows who could not move! Anyway, it was another of those rare days on the ice, with enough choice to worry about sharing with other teams, and a terrific guide! And a reasonable day for Scotland with little snow, no rain, plenty of wind and not that cold (except when belaying!).

Filed under: Mountains, pictures, Travel Tagged: Ben Nevis, Carn Dearg Buttress, Highlands, ice climbing, point five gully, Scotland, Scottish climbing grade, waterfall

### relabelling mixtures

**A**nother short paper about relabelling in mixtures was arXived last week by Pauli and Torelli. They refer rather extensively to a previous paper by Puolamäki and Kaski (2009) of which I was not aware, paper attempting to get an unswitching sampler that does *not* exhibit any label switching, a concept I find most curious as I see no rigorous way to state that a sampler is *not* switching! This would imply spotting low posterior probability regions that the chain would cross. But I should check the paper nonetheless.

Because the G component mixture posterior is invariant under the G! possible permutations, I am somewhat undeciced as to what the authors of the current paper mean by estimating the difference between two means, like μ1-μ2. Since they object to using the output of a perfectly mixing MCMC algorithm and seem to prefer the one associated with a non-switching chain. Or by estimating the probability that a given observation is from a given component, since this is exactly 1/G by the permutation invariance property. In order to identify a partition of the data, they introduce a loss function on the joint allocations of pairs of observations, loss function that sounds quite similar to the one we used in our 2000 JASA paper on the label switching deficiencies of MCMC algorithms. (And makes me wonder why this work of us is not deemed relevant for the approach advocated in the paper!) Still, having read this paper, which I find rather poorly written, I have no clear understanding of how the authors give a precise meaning to a *specific* component of the mixture distribution. Or how the relabelling has to be conducted to *avoid* switching. That is, how the authors define their parameter space. Or their loss function. Unless one falls back onto the ordering of the means or the weights which has the drawback of not connecting with the levels sets of a particular mode of the posterior distribution, meaning that imposing the constraints result in a region that contains bits of several modes.

At some point the authors assume the data can be partitioned into K≤G groups such that there is a representative observation within each group never sharing a component (across MCMC iterations) with any of the other representatives. While this notion is label invariant, I wonder whether (a) this is possible on any MCMC outcome; (b) it indicates a positive or negative feature of the MCMC sampler.; and (c) what prevents the representatives to switch in harmony from one component to the next while preserving their perfect mutual exclusion… This however constitutes the advance in the paper, namely that component dependent quantities as estimated as those associated with a particular representative. Note that the paper contains no illustration, hence that the method may prove hard to impossible to implement!

Filed under: Books, Statistics Tagged: arXiv, Bayesian estimation, finite mixtures, label switching, Matthew Stephens, pivot, University of Warwick

### snow!

Filed under: Kids, pictures, Travel, University life Tagged: England, snow, United Kingdom, University of Warwick, winter, Zeeman building

### Bayesian optimization for likelihood-free inference of simulator-based statistical models

**M**ichael Gutmann and Jukka Corander arXived this paper two weeks ago. I read part of it (mostly the extended introduction part) on the flight from Edinburgh to Birmingham this morning. I find the reflection it contains on the nature of the ABC approximation quite deep and thought-provoking. Indeed, the major theme of the paper is to visualise ABC (which is admittedly shorter than “likelihood-free inference of simulator-based statistical models”!) as a regular computational method based on an approximation of the likelihood function at the observed value, yobs. This includes for example Simon Wood’s synthetic likelihood (who incidentally gave a talk on his method while I was in Oxford). As well as non-parametric versions. In both cases, the approximations are based on repeated simulations of pseudo-datasets for a given value of the parameter θ, either to produce an estimation of the mean and covariance of the sampling model as a function of θ or to construct genuine estimates of the likelihood function. As assumed by the authors, this calls for a small dimension θ. This approach actually allows for the inclusion of the synthetic approach as a lower bound on a non-parametric version.

In the case of Wood’s synthetic likelihood, two questions came to me:

- the estimation of the mean and covariance functions is usually not smooth because new simulations are required for each new value of θ. I wonder how frequent is the case where we can always use the same basic random variates for all values of θ. Because it would then give a smooth version of the above. In the other cases, provided the dimension is manageable, a Gaussian process could be first fitted before using the approximation. Or any other form of regularization.
- no mention is made [in the current paper] of the impact of the parametrization of the summary statistics. Once again, a Cox transform could be applied to each component of the summary for a better proximity of/to the normal distribution.

When reading about a non-parametric approximation to the likelihood (based on the summaries), the questions I scribbled on the paper were:

- estimating a complete density when using this estimate at the single point yobs could possibly be superseded by a more efficient approach.
- the authors study a kernel that is a function of the difference or distance between the summaries and which is maximal at zero. This is indeed rather frequent in the ABC literature, but does it impact the convergence properties of the kernel estimator?
- the estimation of the tolerance, which happens to be a bandwidth in that case, does not appear to be processed in this paper, which could explain for very low probabilities of acceptance mentioned in the paper.
- I am lost as to why lower bounds on likelihoods are relevant here. Unless this is intended for ABC maximum likelihood estimation.

Guttmann and Corander also comment on the first point, through the cost of producing a likelihood estimator. They therefore suggest to resort to regression and to avoid regions of low estimated likelihood. And rely on Bayesian optimisation. (Hopefully to be commented later.)

Filed under: Books, Statistics, University life Tagged: ABC, ABC validation, Bayesian optimisation, non-parametrics, synthetic likelihood

### Bayesian Analysis, Volume 10, Number 1 (2015)

Contents:

**Trevelyan J. McKinley**, **Michelle Morters**, **James L. N. Wood**. Bayesian Model Choice in Cumulative Link Ordinal Regression Models. 1--30.

**Fumiyasu Komaki**. Asymptotic Properties of Bayesian Predictive Densities When the Distributions of Data and Target Variables are Different. 31--51.

**Harold Bae**, **Thomas Perls**, **Martin Steinberg**, **Paola Sebastiani**. Bayesian Polynomial Regression Models to Fit Multiple Genetic Models for Quantitative Traits. 53--74.

**Dimitris Fouskakis**, **Ioannis Ntzoufras**, **David Draper**. Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models. 75--107.

**A. Mohammadi**, **E. C. Wit**. Bayesian Structure Learning in Sparse Gaussian Graphical Models. 109--138.

**Cyr Emile M’lan**, **Ming-Hui Chen**. Objective Bayesian Inference for Bilateral Data. 139--170.

**Fernando V. Bonassi**, **Mike West**. Sequential Monte Carlo with Adaptive Weights for Approximate Bayesian Computation. 171--187.

### Bayesian Model Choice in Cumulative Link Ordinal Regression Models

**Trevelyan J. McKinley**,

**Michelle Morters**,

**James L. N. Wood**.

**Source: **Bayesian Analysis, Volume 10, Number 1, 1--30.

**Abstract:**

The use of the proportional odds (PO) model for ordinal regression is ubiquitous in the literature. If the assumption of parallel lines does not hold for the data, then an alternative is to specify a non-proportional odds (NPO) model, where the regression parameters are allowed to vary depending on the level of the response. However, it is often difficult to fit these models, and challenges regarding model choice and fitting are further compounded if there are a large number of explanatory variables. We make two contributions towards tackling these issues: firstly, we develop a Bayesian method for fitting these models, that ensures the stochastic ordering conditions hold for an arbitrary finite range of the explanatory variables, allowing NPO models to be fitted to any observed data set. Secondly, we use reversible-jump Markov chain Monte Carlo to allow the model to choose between PO and NPO structures for each explanatory variable, and show how variable selection can be incorporated. These methods can be adapted for any monotonic increasing link functions. We illustrate the utility of these approaches on novel data from a longitudinal study of individual-level risk factors affecting body condition score in a dog population in Zenzele, South Africa.

### Asymptotic Properties of Bayesian Predictive Densities When the Distributions of Data and Target Variables are Different

**Fumiyasu Komaki**.

**Source: **Bayesian Analysis, Volume 10, Number 1, 31--51.

**Abstract:**

Bayesian predictive densities when the observed data $x$ and the target variable $y$ to be predicted have different distributions are investigated by using the framework of information geometry. The performance of predictive densities is evaluated by the Kullback–Leibler divergence. The parametric models are formulated as Riemannian manifolds. In the conventional setting in which $x$ and $y$ have the same distribution, the Fisher–Rao metric and the Jeffreys prior play essential roles. In the present setting in which $x$ and $y$ have different distributions, a new metric, which we call the predictive metric, constructed by using the Fisher information matrices of $x$ and $y$ , and the volume element based on the predictive metric play the corresponding roles. It is shown that Bayesian predictive densities based on priors constructed by using non-constant positive superharmonic functions with respect to the predictive metric asymptotically dominate those based on the volume element prior of the predictive metric.

### Bayesian Polynomial Regression Models to Fit Multiple Genetic Models for Quantitative Traits

**Harold Bae**,

**Thomas Perls**,

**Martin Steinberg**,

**Paola Sebastiani**.

**Source: **Bayesian Analysis, Volume 10, Number 1, 53--74.

**Abstract:**

We present a coherent Bayesian framework for selection of the most likely model from the five genetic models (genotypic, additive, dominant, co-dominant, and recessive) commonly used in genetic association studies. The approach uses a polynomial parameterization of genetic data to simultaneously fit the five models and save computations. We provide a closed-form expression of the marginal likelihood for normally distributed data, and evaluate the performance of the proposed method and existing method through simulated and real genome-wide data sets.

### Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models

**Dimitris Fouskakis**,

**Ioannis Ntzoufras**,

**David Draper**.

**Source: **Bayesian Analysis, Volume 10, Number 1, 75--107.

**Abstract:**

In the context of the expected-posterior prior (EPP) approach to Bayesian variable selection in linear models, we combine ideas from power-prior and unit-information-prior methodologies to simultaneously (a) produce a minimally-informative prior and (b) diminish the effect of training samples. The result is that in practice our power-expected-posterior (PEP) methodology is sufficiently insensitive to the size $n^{*}$ of the training sample, due to PEP’s unit-information construction, that one may take $n^{*}$ equal to the full-data sample size $n$ and dispense with training samples altogether. This promotes stability of the resulting Bayes factors, removes the arbitrariness arising from individual training-sample selections, and greatly increases computational speed, allowing many more models to be compared within a fixed CPU budget. We find that, under an independence Jeffreys (reference) baseline prior, the asymptotics of PEP Bayes factors are equivalent to those of Schwartz’s Bayesian Information Criterion (BIC), ensuring consistency of the PEP approach to model selection. Our PEP prior, due to its unit-information structure, leads to a variable-selection procedure that — in our empirical studies — (1) is systematically more parsimonious than the basic EPP with minimal training sample, while sacrificing no desirable performance characteristics to achieve this parsimony; (2) is robust to the size of the training sample, thus enjoying the advantages described above arising from the avoidance of training samples altogether; and (3) identifies maximum-a-posteriori models that achieve better out-of-sample predictive performance than that provided by standard EPPs, the $g$ -prior, the hyper- $g$ prior, non-local priors, the Least Absolute Shrinkage and Selection Operator (LASSO) and Smoothly-Clipped Absolute Deviation (SCAD) methods.