What is Bayesian Analysis?

by Kate Cowles, Rob Kass, and Tony O’Hagan

What we now know as Bayesian statistics has not had a clear run since 1763. Although Bayes’s method was enthusiastically taken up by Laplace and other leading probabilists of the day, it fell into disrepute in the 19th century because they did not yet know how to handle prior probabilities properly. The first half of the 20th century saw the development of a completely different theory, now called frequentist statistics. But the flame of Bayesian thinking was kept alive by a few thinkers such as Bruno de Finetti in Italy and Harold Jeffreys in England. The modern Bayesian movement began in the second half of the 20th century, spearheaded by Jimmy Savage in the USA and Dennis Lindley in Britain, but Bayesian inference remained extremely difficult to implement until the late 1980s and early 1990s when powerful computers became widely accessible and new computational methods were developed. The subsequent explosion of interest in Bayesian statistics has led not only to extensive research in Bayesian methodology but also to the use of Bayesian methods to address pressing questions in diverse application areas such as astrophysics, weather forecasting, health care policy, and criminal justice.

Scientific hypotheses typically are expressed through probability distributions for observable scientific data. These probability distributions depend on unknown quantities called parameters. In the Bayesian paradigm, current knowledge about the model parameters is expressed by placing a probability distribution on the parameters, called the “prior distribution”, often written as

$p(\theta).$

When new data $\mathbf{y}$ become available, the information they contain regarding the model parameters is expressed in the “likelihood,” which is proportional to the distribution of the observed data given the model parameters, written as

$p(\mathbf{y} \vert \theta).$

This information is then combined with the prior to produce an updated probability distribution called the “posterior distribution,” on which all Bayesian inference is based. Bayes’ Theorem, an elementary identity in probability theory, states how the update is done mathematically: the posterior is proportional to the prior times the likelihood, or more precisely,

$p(\theta \vert y) = \frac{p(\theta) p(y \vert \theta)}{\int_{\Theta}p(\theta) p(y \vert \theta) d\theta}.$

In theory, the posterior distribution is always available, but in realistically complex models, the required analytic computations often are intractable. Over several years, in the late 1980s and early 1990s, it was realized that methods for drawing samples from the posterior distribution could be very widely applicable.

There are many reasons for adopting Bayesian methods, and their applications appear in diverse fields. Many people advocate the Bayesian approach because of its philosophical consistency. Various fundamental theorems show that if a person wants to make consistent and sound decisions in the face of uncertainty, then the only way to do so is to use Bayesian methods. Others point to logical problems with frequentist methods that do not arise in the Bayesian framework. On the other hand, prior probabilities are intrinsically subjective – your prior information is different from mine – and many statisticians see this as a fundamental drawback to Bayesian statistics. Advocates of the Bayesian approach argue that this is inescapable, and that frequentist methods also entail subjective choices, but this has been a basic source of contention between the `fundamentalist’ supporters of the two statistical paradigms for at least the last 50 years. In contrast, it is more the pragmatic advantages of the Bayesian approach that have fuelled its strong growth over the last 20 years, and are the reason for its adoption in a rapidly growing variety of fields. Powerful computational tools allow Bayesian methods to tackle large and complex statistical problems with relative ease, where frequentist methods can only approximate or fail altogether. Bayesian modelling methods provide natural ways for people in many disciplines to structure their data and knowledge, and they yield direct and intuitive answers to the practitioner’s questions.

There are many varieties of Bayesian analysis. The fullest version of the Bayesian paradigm casts statistical problems in the framework of decision making. It entails formulating subjective prior probabilities to express pre-existing information, careful modelling of the data structure, checking and allowing for uncertainty in model assumptions, formulating a set of possible decisions and a utility function to express how the value of each alternative decision is affected by the unknown model parameters. But each of these components can be omitted. Many users of Bayesian methods do not employ genuine prior information, either because it is insubstantial or because they are uncomfortable with subjectivity. The decision-theoretic framework is also widely omitted, with many feeling that statistical inference should not really be formulated as a decision. So there are varieties of Bayesian analysis and varieties of Bayesian analysts. But the common strand that underlies this variation is the basic principle of using Bayes’ theorem and expressing uncertainty about unknown parameters probabilistically.