Logarithmic Sobolev inequalities
As promised, this post will be about logarithmic Sobolev inequalities, which are (exponential) concentration inequalities for functions defined on the binary hypercube. In particular, we’ll review a specific case where we get speedy concentration thanks to a variance-like bound on our function of interest. Let’s concentrate some measure!
Starting simple
To get things going, we’ll look at a logarithmic Sobolev inequality for a function defined on the dimensional binary hypercube
Our random variables are Rademacher random variables, i.e.,
We define as our real-valued random variable of interest. Logarithmic Sobolev inequalities compare to quantities related to :
- entropy functional
- Efron–Stein-like functional
The quantity is the vector with its th component drawn from an independent copy—this is the same idea as in the Efron–Stein inequality. For our purposes, this implies that changes sign.
The logarithmic Sobolev inequalityfor this set up is
Concentration inequality
With our logarithmic Sobolev inequality pinned down, we’ll now prove a concentration inequality for a function satisfying
for some and . In particular, we’ll show that for all
Technical detail—hint. The proof hinges on the following fact (which we’ll show in a later post because it uses some nice ideas)
for .
Proof
We’re going to follow the so-called Herbst argument to prove the concentration inequality. The key ingredient of Herbst’s argument is the use of a differential inequality on a function with parameter .
Part 1: defining a differential inequality.
The entropy of is
Now, if we stare at this for a moment and let our cumulant generating function neurons fire, then we see that if we let , then we get a differential inequality
Part 2: putting the logarithmic Sobolev inequality to work.
The logarithmic Sobolev inequality states
Part 3: connecting the dots with some calculus.
Okay, cool; we’ve shown that
which we can rewrite as
Now, let’s write this guy in a slightly different form in the spirit of the Herbst argument:
The calculus part of brain might be screaming something like
which we translate to
Invoking more calculus ideas, l’Hospital’s rule says
Part 4: bounding via Cramér–Chernoff and Markov.
We’re almost there! Let’s assume that and then integrate the inequality between and . From l’Hospital’s rule, we get that . Now, let’s undo our -in-the-denominator and our logarithmic term and write
Finally, we can use the Cramér–Chernoff technique and Markov’s inequality to say that
(Note that minimizes the upper bound.) Pat yourself on the back and give Mr. Herbst a (pretend) pat on the back too.
Next up
…bandits!
References
This post is based on material from Chapter 5 of (my new favorite book):
- Concentration Inequalities: A Nonasymptotic Theory of Independence by S. Boucheron, G. Lugosi, and P. Masart.