Entropy and its subadditivity
(Happy St. Patrick’s Day!)
In this post, we’re picking up from the previous post and continuing with some information-theoretic ideas as they arise in concentration inequalities. We’ll quickly revisit the Efron–Stein inequality and then proceed towards the subadditivity of entropy.
Efron–Stein
To encourage the formation of new (or reinforce existing) neuronal connections, let’s remind ourselves of the Efron–Stein inequality. If we have independent random variables and a square-integrable function with , then
where the operator is the conditional expectation with respect to the th variable. The square-integrable condition on ensures that its variance is finite.
Entropy
Now, let’s generalize the Efron–Stein inequality by observing that we could write it as
where we used the fact that . If we stare at this representation of the various for a moment, we see that it has the form
where . When , then quantity is often called the entropy and is denoted by .
Subadditivity of entropy
Using the Efron–Stein inequality as our source of inspiration, we might guess (or hope) that
If we did guess this, then we’d be correct! So, let’s prove it.
We’ll assume that are independent random variables taking values in a finite set and that with .
Proof.
The proof relies on an application of Han’s inequality for relative entropies (which is an extension of what we showed in the previous post).
Note that is always positive, so the result holds for any with . Let’s use this to simplify our lives and assume that . So, now we can write
Now, we’re going to introduce the relative entropy component by defining
The relative entropy neurons in your brain are probably going wild because we can now write
So, let’s take an expectation of :
Here’s where Han’s inequality comes to the scene, because the relative entropy form of Han’s inequality says that
In other words, we have a bound on the (joint) relative entropy based on the sum of marginal relative entropies. Lastly, we’ll show that
which will conclude the proof. The tower property of expectation let’s us write
and (with a bit more work)
So, chaining everything together we have that
That’s it, that’s all folks.
Next up
… logarithmic Sobolev inequalities! Stay tuned—it’s gonna be good.
References
This post is based on material from Chapter 4 of (my new favorite book):
- Concentration Inequalities: A Nonasymptotic Theory of Independence by S. Boucheron, G. Lugosi, and P. Masart.