In this post, we’re going to work through another Rademacher-related problem. Like the previous post, the problem highlights the usefulness of the Efron–Stein inequality.

Conditional Rademacher averages

In this section, we’re going to continue with a close cousin of the Rademacher average called the conditional Rademacher average, which are used in high dimensional statistics to measure the complexity of model classes. Its defined as

where are Rademacher random variables and the are independent random variables that live in . The random variable is a function of the .

Without lifting a finger, we can call our bound differences inequality friend and bound the variance of as

but that’s less-than-satisfying at this point in time. Instead, we can improve on this bound by showing that satisfies the self-bounding property via the Efron–Stein inequality.

First, let’s define

and observe that because the maximum value contributed by to the sum is 1. By summing the inequality and using a convexity argument (see below), we get that

Now, we can use the Efron–Stein inequality to say

which shows that is a self-bounding function.

References

This post is based on material from Chapter 3 of:

  • Concentration Inequalities: A Nonasymptotic Theory of Independence by S. Boucheron, G. Lugosi, and P. Masart.

Self-bounding property

Loosely speaking, the self-bounding property states that the difference between a function and a similar function that omits one variable is bounded between 0 and 1.

Formally, a nonnegative function satisfies the self-bounding property when functions exist and the following two conditions hold:

and

for and .

If we combine the first and second conditions, we can say that

which means we can use the Efron–Stein inequality to bound the variance by the expected value.

Convexity argument

The convexity argument used in the conditional Rademacher averages example goes as follows:

where

and

R simulation

If you want to play around with conditional Rademacher averages, here’s some R code to do it.

n_trials <- 1000
n <- 5
d <- 20
Z <- rep(0, n_trials/10)
EZ <- rep(0, n_trials)

# simulate X variables
X <- matrix(runif(n*d, -1, 1), nrow = n)

for(j in 1:n_trials){
      for(k in 1:n_trials/10){
            y <- rbinom(n, 1, 0.5)
            y[y == 0] <- -1
            yX <- y %*% X
            Z[k] <- max(colSums(yX))
      }
      EZ[j] <- mean(Z)
}

hist(EZ)
mean(EZ)
var(EZ)