Fisherian vignette

Sorry for the delay between posts—I’m in the process of migrating this blog from jekyll to Hugo. In this post, we’re going to unpack a few more of Fisher’s ideas.

Our playground

We’re using the same set up as in Efron’s paper. In particular, we have the output of a randomized experiment given in the following table.

	yes	no	row sum
treatment	1	15	16
control	13	3	16
column sum	14	18

The sample log-odds are

$\hat{\theta} = \log \left( \frac{1}{15} \bigg/ \frac{13}{3} \right) = -4.2.$

If we view the table as a multinomial model—as Fisher did—we have three degrees of freedom. The would-be fourth degree gets absorbed by the fact that we require the probabilities to sum to 1.

Conditional inference and approximate ancillarity

Fisher wanted make inferences about $\theta$ , so he developed a clever workaround to deal with the nuisance parameters by conditioning on the marginals of the table. The marginals bring some information to the estimation problem, but not much—this is why Fisher dubbed them approximate ancillary statistics.¹ In other words, Fisher reduced the problem to something that only depends on $\theta$ , not the other stuff:

$\hat{\theta} | \textsf{marginals} \sim f(\theta).$

Said another way, Fisher changed the density of the MLE to a conditional density, i.e.,

$f_\theta \big( \hat{\theta} \big) \to f_\theta \big( \hat{\theta} \, | \, A \big).$

Fisher’s final trick was to move from densities to likelihoods. Likelihoods are easy to calculate, so writing

$f_\theta \big( \hat{\theta} \,|\, A \big) = c \frac{L(\theta)}{L\big(\hat{\theta}\big)},$

where $c$ is a constant gives us a compact way to efficiently estimate translation families.

Caveat! The multinomial table is not a translation family. Obtaining the conditional distribution requires a less-than-pleasant calculation.

The magic formula

The magic formula is a modern variant of Fisher’s ratio of likelihood that includes a Fisher information term:

$f_\theta \big( \hat{\theta} \, | \, A \big) = c \frac{L(\theta)}{L\big(\hat{\theta}\big)} \left[ - \frac{d^2}{d\theta^2} \log L(\theta) \vert_{\theta = \hat{\theta}} \right]^{1 / 2}.$

The term in square brackets has a Fisher information feel to it (perhaps unsurprisingly). Let’s unpack the formula with an example.

A normal example. Assume we have data $x$ from a normal distribution with mean $\mu$ and standard deviation $\sigma$ . We’re interested in estimating the distribution’s mean when both $\mu$ and $\sigma$ are unknown. In this case $\sigma$ is a nuisance parameter, so let’s use the magic formula.

The density of a normal distribution is

$f(x; \mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-(x-\mu)^2}{2\sigma^2}}$

and the log-likelihood is

$l(\mu) = -\frac{1}{2} \log (2\pi\sigma^2) - \frac{1}{2\sigma^2}(x - \mu)^2.$

The first and second derivatives of the log-likelihood are

$\frac{1}{\sigma^2}(x-\mu) \quad \textsf{and} \quad -\frac{1}{\sigma^2}.$

Now, back to the magic formula:

$f_\theta \big( \hat{\theta} \, | \, \sigma \big) = c \frac{e^{-(x-\mu)^2}}{e^{-(x-\hat{\mu})^2}} \frac{1}{\sigma}.$

So, conditional on $\sigma$ , we now have something that only depends on $\mu$ .

Next up

… an introduction to empirical Bayes!

References

This post is related to material from:

“R.A. Fisher in the 21st Century” by Bradley Efron.
Computer Age Statistical Inference: Algorithms, Evidence, and Data Science by Bradley Efron and Trevor Hastie. A digital copy lives here: CASI.
An Introduction to the Bootstrap by Bradley Efron and Robert J. Tibshirani.

Footnotes

According to Efron’s, Neyman put Fisher’s ideas on firmer, frequentist-style footing. ↩