As promised, this post will be about logarithmic Sobolev inequalities, which are (exponential) concentration inequalities for functions defined on the binary hypercube. In particular, we’ll review a specific case where we get speedy concentration thanks to a variance-like bound on our function of interest. Let’s concentrate some measure!

Starting simple

To get things going, we’ll look at a logarithmic Sobolev inequality for a function defined on the dimensional binary hypercube

Our random variables are Rademacher random variables, i.e.,

We define as our real-valued random variable of interest. Logarithmic Sobolev inequalities compare to quantities related to :

  • entropy functional
  • Efron–Stein-like functional

The quantity is the vector with its th component drawn from an independent copy—this is the same idea as in the Efron–Stein inequality. For our purposes, this implies that changes sign.

The logarithmic Sobolev inequalityfor this set up is

Concentration inequality

With our logarithmic Sobolev inequality pinned down, we’ll now prove a concentration inequality for a function satisfying

for some and . In particular, we’ll show that for all

Technical detail—hint. The proof hinges on the following fact (which we’ll show in a later post because it uses some nice ideas)

for .

Proof

We’re going to follow the so-called Herbst argument to prove the concentration inequality. The key ingredient of Herbst’s argument is the use of a differential inequality on a function with parameter .

Part 1: defining a differential inequality.

The entropy of is

Now, if we stare at this for a moment and let our cumulant generating function neurons fire, then we see that if we let , then we get a differential inequality

Part 2: putting the logarithmic Sobolev inequality to work.

The logarithmic Sobolev inequality states

Part 3: connecting the dots with some calculus.

Okay, cool; we’ve shown that

which we can rewrite as

Now, let’s write this guy in a slightly different form in the spirit of the Herbst argument:

The calculus part of brain might be screaming something like

which we translate to

Invoking more calculus ideas, l’Hospital’s rule says

Part 4: bounding via Cramér–Chernoff and Markov.

We’re almost there! Let’s assume that and then integrate the inequality between and . From l’Hospital’s rule, we get that . Now, let’s undo our -in-the-denominator and our logarithmic term and write

Finally, we can use the Cramér–Chernoff technique and Markov’s inequality to say that

(Note that minimizes the upper bound.) Pat yourself on the back and give Mr. Herbst a (pretend) pat on the back too.

Next up

…bandits!

References

This post is based on material from Chapter 5 of (my new favorite book):

  • Concentration Inequalities: A Nonasymptotic Theory of Independence by S. Boucheron, G. Lugosi, and P. Masart.