Logarithmic Sobolev inequalities

As promised, this post will be about logarithmic Sobolev inequalities, which are (exponential) concentration inequalities for functions defined on the binary hypercube. In particular, we’ll review a specific case where we get speedy concentration thanks to a variance-like bound on our function of interest. Let’s concentrate some measure!

Starting simple

To get things going, we’ll look at a logarithmic Sobolev inequality for a function defined on the $n$ dimensional binary hypercube

$f: \{-1,1\}^n \to \reals.$

Our random variables $X \in \{-1,1\}^n$ are Rademacher random variables, i.e.,

$g_X(x) = \begin{cases} +1, & \textsf{with probability } 0.5 \\ -1, & \textsf{with probability } 0.5. \end{cases}$

We define $Z = f(X)$ as our real-valued random variable of interest. Logarithmic Sobolev inequalities compare to quantities related to $f$ :

entropy functional

$\mathrm{ent}(f) = \E \big[ f(X) \log f(X) \big] - \E f(X) \log \big( \E f(X) \big)$

Efron–Stein-like functional

$\mathcal{E}(f) = \frac{1}{2} \E \left[ \sum_{i=1}^n \left( f(X) - f \left( \tilde{X}^{(i)} \right) \right)^2 \right].$

The quantity $\tilde{X}^{(i)}$ is the vector $X$ with its $i$ th component drawn from an independent copy—this is the same idea as in the Efron–Stein inequality. For our purposes, this implies that $X_i$ changes sign.

The logarithmic Sobolev inequalityfor this set up is

$\mathrm{ent}(f^2) \le 2 \mathcal{E}(f).$

Concentration inequality

With our logarithmic Sobolev inequality pinned down, we’ll now prove a concentration inequality for a function satisfying

$\sum_{i=1}^n \left( f(x) - f \left( \tilde{x}^{(i)} \right) \right)^2 \le v$

for some $v > 0$ and $x \in \{-1,1\}^n$ . In particular, we’ll show that for all $t > 0$

$\prob(Z > \E Z + t) \le e^{-2t^2/v}.$

Technical detail—hint. The proof hinges on the following fact (which we’ll show in a later post because it uses some nice ideas)

$\left( e^{z/2} - e^{y/2} \right)^2 \le \frac{(z-y)^2}{8}(e^{z} + e^{y})$

for $z \ge y$ .

Proof

We’re going to follow the so-called Herbst argument to prove the concentration inequality. The key ingredient of Herbst’s argument is the use of a differential inequality on a function $g(x) = e^{\lambda f(x)/2}$ with parameter $\lambda$ .

Part 1: defining a differential inequality.

The entropy of $g^2$ is

$\mathrm{ent}(g^2) = \mathrm{ent}(e^{\lambda f}) = \lambda \E \big[ e^{\lambda f(X)} f(X) \big] - \E e^{\lambda f(X)} \log \big( \E e^{\lambda f(X)} \big).$

Now, if we stare at this for a moment and let our cumulant generating function neurons fire, then we see that if we let $F(\lambda) = \E e^{\lambda f(X)}$ , then we get a differential inequality

$\mathrm{ent}(g^2) = \lambda F'(\lambda) - F(\lambda) \log F(\lambda).$

Part 2: putting the logarithmic Sobolev inequality to work.

The logarithmic Sobolev inequality states

$\begin{align*} \mathrm{ent}(g^2) &\le 2 \mathcal{E}(g) \\ &\stackrel{\textsf{def}}{=} \E \left[ \sum_{i=1}^n \left( e^{\lambda f(X) / 2} - e^{\lambda f \left( \tilde{X}^{(i)} \right) / 2} \right)^2 \right] \\ &\stackrel{\textsf{hint}}{\le} \frac{1}{8} \sum_{i=1}^n \E \left[ \lambda^2 \left( f(X) - f \left( \tilde{X}^{(i)} \right) \right)^2 \left( e^{\lambda f(X)} - e^{\lambda f \left( \tilde{X}^{(i)} \right)} \right) \right] \\ &\le \frac{\lambda^2 v}{8} \E \left[ e^{\lambda f(X)} \right]. \end{align*}$

Part 3: connecting the dots with some calculus.

Okay, cool; we’ve shown that

$\mathrm{ent}(e^{\lambda f}) \le \frac{\lambda^2 v}{8} \E \left[ e^{\lambda f(X)} \right]$

which we can rewrite as

$\lambda F'(\lambda) - F(\lambda) \log F(\lambda) \le \frac{\lambda^2 v}{8} F(\lambda).$

Now, let’s write this guy in a slightly different form in the spirit of the Herbst argument:

$\frac{\lambda F'(\lambda) - F(\lambda) \log F(\lambda)}{\lambda^2 F(\lambda)} = \frac{\lambda F'(\lambda) - \log F(\lambda)}{\lambda^2 F(\lambda)} \le \frac{v}{8}.$

The calculus part of brain might be screaming something like

$\frac{d}{dz} \left( \frac{\log z}{z} \right) = \frac{1 - \log z}{z^2},$

which we translate to

$\frac{d}{dz} \left( \frac{\log F(\lambda)}{\lambda} \right) \le \frac{v}{8}.$

Invoking more calculus ideas, l’Hospital’s rule says

$\lim_{\lambda \to 0} \frac{\log F(\lambda)}{\lambda} = \frac{F'(0)}{F(0)} = \E Z.$

Part 4: bounding via Cramér–Chernoff and Markov.

We’re almost there! Let’s assume that $\lambda > 0$ and then integrate the inequality between $0$ and $\lambda$ . From l’Hospital’s rule, we get that $\log F(\lambda) / \lambda \le \E Z + \lambda v / 8$ . Now, let’s undo our $\lambda$ -in-the-denominator and our logarithmic term and write

$F(\lambda) \le e^{\lambda \E Z + \lambda^2 v / 8}.$

Finally, we can use the Cramér–Chernoff technique and Markov’s inequality to say that

$\begin{align*} \prob(Z > \E Z + t) &\le \inf_{\lambda > 0} F(\lambda)e^{-\lambda \E Z - \lambda t} \\ &\le \inf_{\lambda > 0} e^{\lambda^2 v / 8 - \lambda t} \\ &\le e^{-2t^2/v}. \end{align*}$

(Note that $\lambda = 4t/v$ minimizes the upper bound.) Pat yourself on the back and give Mr. Herbst a (pretend) pat on the back too.

Next up

…bandits!

References

This post is based on material from Chapter 5 of (my new favorite book):

Concentration Inequalities: A Nonasymptotic Theory of Independence by S. Boucheron, G. Lugosi, and P. Masart.