Han's inequality
In this post, we’re putting on our information theory hats and digging into to a simple but useful inequality gifted to us by Te Sun Han in 1978. Han’s inequality is a generalization of the Efron–Stein inequality with roots in information theory rather than the statistical analysis of the jackknife.1 Before teeing up Han’s inequality, we’ll cover a few ideas from information theory. Andiamo!
A modicum of information theory
We’ll need the idea of (Shannon’s) entropy in Han’s inequality, so here it is for a discrete random variable
with the standard convention that .
Also, we use the definition of conditional entropy, which is defined as
We can write the conditional entropy as
where we use the relationship between conditional and joint probability in the second line.
Lastly, we’ll use the intuitive idea that conditioning decreases entropy. This result follows from a relative entropy (i.e., Kullback–Leibler) calculation. For two random variables and , with marginal distributions and , and joint distribution , the relative entropy between and is
Since this is greater than 0, we have , meaning that conditioning reduces entropy.
Han’s inequality
In its simplest form, Han’s inequality works only with discrete random variables. We’re going to look at that case.
Han’s inequality. Assume are discrete random variables. Then
Before proving the inequality, let’s interpret it. Roughly, it says that the joint entropy is less than the average entropy with the $i$th observation removed. So, we’re better off (i.e., have less uncertainty) if we know everything instead of accumulating (averaging) partial information.
Proof. Han’s inequality relies on two things:
- the definition of conditional entropy and
- the fact that conditioning reduces entropy.
For , we have
where in (a) we used the defintion of conditional entropy and in (b) we used the fact that conditioning reduces entropy. In other words, we have
Now let’s sum both sides of the inequality, which gives
which can be rewritten as
And that’s it!
Next up
… we’ll put Han’s inequality to work and look at the sub-additivity of entropy! Stay tuned.
References
This post is based on material from Chapter 4 of:
- Concentration Inequalities: A Nonasymptotic Theory of Independence by S. Boucheron, G. Lugosi, and P. Masart.
-
That said, we now know and appreciate how closely related information theory and statistics really are. ↩