(Happy St. Patrick’s Day!)

In this post, we’re picking up from the previous post and continuing with some information-theoretic ideas as they arise in concentration inequalities. We’ll quickly revisit the Efron–Stein inequality and then proceed towards the subadditivity of entropy.

Efron–Stein

To encourage the formation of new (or reinforce existing) neuronal connections, let’s remind ourselves of the Efron–Stein inequality. If we have independent random variables and a square-integrable function with , then

where the operator is the conditional expectation with respect to the th variable. The square-integrable condition on ensures that its variance is finite.

Entropy

Now, let’s generalize the Efron–Stein inequality by observing that we could write it as

where we used the fact that . If we stare at this representation of the various for a moment, we see that it has the form

where . When , then quantity is often called the entropy and is denoted by .

Subadditivity of entropy

Using the Efron–Stein inequality as our source of inspiration, we might guess (or hope) that

If we did guess this, then we’d be correct! So, let’s prove it.

We’ll assume that are independent random variables taking values in a finite set and that with .

Proof.

The proof relies on an application of Han’s inequality for relative entropies (which is an extension of what we showed in the previous post).

Note that is always positive, so the result holds for any with . Let’s use this to simplify our lives and assume that . So, now we can write

Now, we’re going to introduce the relative entropy component by defining

The relative entropy neurons in your brain are probably going wild because we can now write

So, let’s take an expectation of :

Here’s where Han’s inequality comes to the scene, because the relative entropy form of Han’s inequality says that

In other words, we have a bound on the (joint) relative entropy based on the sum of marginal relative entropies. Lastly, we’ll show that

which will conclude the proof. The tower property of expectation let’s us write

and (with a bit more work)

So, chaining everything together we have that

That’s it, that’s all folks.

Next up

… logarithmic Sobolev inequalities! Stay tuned—it’s gonna be good.

References

This post is based on material from Chapter 4 of (my new favorite book):

  • Concentration Inequalities: A Nonasymptotic Theory of Independence by S. Boucheron, G. Lugosi, and P. Masart.