Fisherian vignette

Today we’re talking about confidence intervals, so put on your inference hat and let’s get down to business.

Confidence intervals

Modern computation augmented by Fisher’s ideas gives us a nice way to generate confidence intervals for parameter estimates. That said, even in the absence of modern computation, we can still accomplish a lot with Fisher’s toolbox. Fisher’s approximate confidence intervals for a maximum likelihood estimate $\hat{\theta}$ are

$\hat{\theta} \pm z_\alpha \, \widehat{\textsf{se}},$

where $z_\alpha$ is value at the $\alpha$ -quantile of a standard normal distribution. We can use this interval because MLEs are asymptotically normal, i.e., $\hat{\theta} \stackrel{\textsf{asymp.}}{\sim} N(\theta, \textsf{se}^2)$ .

Fisher-style confidence intervals are slick for a few reasons; they are accurate, correct, and automatic. Let’s unpack each of these notions. To make things concrete, assume we’re looking at a two-sided 95% confidence interval.

Accuracy. These confidence intervals converge to their true values at a rate of $n^{-1/2}$ . So, after seeing $n$ samples, the noncoverage probability is

$2.5\% \pm c/\sqrt{n}.$

(The term $c$ depends on the specific problem, but don’t worry about that little guy.)

Correctness. By virtue of using the Fisher information to estimate our standard errors, the confidence interval estimates don’t waste any information. Moreover, the standard error estimates are the best (i.e., smallest possible) for asymptotically unbiased estimates of $\theta$ .

Automatic-ness. No matter how nasty or complicated the problem is, we estimate $\hat{\theta}$ and $\widehat{\textsf{se}}$ using the same algorithm—always!

A small sample caveat

Confidence intervals can be untrustworthy for small samples. To deal with this situation, Fisher would transform his estimates to make them become “normal” more quickly. One of his go-to transformations was the inverse hyperbolic tangent function, which makes quick work of correlation coefficients for normally distributed data.

Bootstrap benefits

The bootstrap distribution allows us to correct our confidence interval estimates for bias, skew, and standard error changes, and we get second-order accuracy—meaning our convergence rate gets faster and becomes $n^{-1}$ ,

$\alpha/2 \pm c/\sqrt{n}$

for a two-side $\alpha$ interval. Hot dang, that’s fast.

Next up

… conditional inference, ancillarity, and the magic formula!

References

This post is related to material from:

“R.A. Fisher in the 21st Century” by Bradley Efron.
Computer Age Statistical Inference: Algorithms, Evidence, and Data Science by Bradley Efron and Trevor Hastie. A digital copy lives here: CASI.
An Introduction to the Bootstrap by Bradley Efron and Robert J. Tibshirani.