Fisherian vignette

In this post, we’re going to be Bayesians—well, sort of… actually we’re going to be empirical Bayesians. If you’ve never seen empirical Bayes, then get ready for a beautiful way to blend the best of frequentist and Bayesian style thinking. And by the way, it’s super useful in many “modern” problems.

Before going any further down Empirical Bayes Avenue, it’s worth saying that empirical Bayes is not really a Fisherian idea. But Fisher wasn’t always Fisherian, so I think we’re justified in discussing an idea that is not attributed to Fisher; moreover it is spiritually similar, in some sense, to Fisher’s style—more on that in the next post.

The key idea of empirical Bayes is to share statistical strength, which is a very natural thing for Bayesian type analysis, but to do so while respecting the underlying data—this is the empirical part. By respecting the underlying data, I mean that we will use the underlying to estimate our prior distribution.

Teeing things up

We’re going to quickly review a few Bayesian ideas before jumping in to empirical Bayes. Assume we have a parameter $\theta$ drawn from a prior distribution $g$ . We observe data $x$ that is generated by a density $f_\theta$ . Notationally, we have

$\theta \sim g \quad \textsf{and} \quad x \sim f_\theta.$

After having observed the data $x$ , we estimate the posterior distribution of $\theta$ via Bayes rule

$g(\theta \,|\, x) = \frac{g(\theta) f_\theta(x)}{\int g(\theta) f_\theta(x) \, d\theta}.$

To be empirical Bayesian, we perform the sleight of hand

$g(\theta \,|\, x) = \frac{g(\theta) f_\theta(x)} {\int \hat{g}(\theta) f_\theta(x) \, d\theta}$

and replace our prior with an estimate of it that is based on the observed data $x$ . Without getting too tied up in details about how we estimate $g$ , let’s look at an example taken from Efron’s paper.

Efron’s log-odds example

In the prior post, we wanted to estimate the sample log-odds. We calculated it as

$\hat{\theta}_8 = \log \left( \frac{1}{15} \bigg/ \frac{13}{3} \right) = -4.2.$

These data are from a hospital that was testing a new treatment and similar data exist for 40 other hospitals. Using the other data, we can estimate confidence intervals for each hospitals log-odds estimate. Doing so gives a 90% (central a posteriori) confidence interval of

$\hat{\theta}_{8,\textsf{ebay}} \in [-5.1, -1.8].$

The ordinary, frequentist-style 90% interval is

$\hat{\theta}_{8,\textsf{freq}} \in [-6.3, -2.4].$

This interval is estimated using only the data in hospital 8’s table. Notice that the ordinary interval is shifted to the left of the empirical Bayesian interval. This results from the fact that majority of the other data produces log-odds estimates which are greater than -4.2. Indeed, the only estimates smaller than this are $-\infty$ .

Next up

… a recap on Fisher’s style and contributions!

References

This post is related to material from:

“R.A. Fisher in the 21st Century” by Bradley Efron.
Computer Age Statistical Inference: Algorithms, Evidence, and Data Science by Bradley Efron and Trevor Hastie. A digital copy lives here: CASI.
Large Scale Inference by Bradley Efron.