Rademachers are rad - part I
In this post, we’re going to play with Rademacher random variables, which are quite useful in many problems that arise in “modern” statistics and machine learning problems. They also pop up in empirical process theory and geometry. So, it’s safe to say that we should at least be acquainted with them.
In case Rademacher random variables are new to you, please meet the Rademacher random variable , which takes the values and with probability . For those of you who prefer a more formal introduction, please meet the probability mass function of
Sometimes Rademacher random variables introduce themselves as random sign variables. Now, that we’ve been properly introduced, let’s move on to the fun stuff.
Rademacher averages
To build a Rademacher average, we’ll need a collection of real numbers that are indexed by the number of variables we have and an index set . We’ll also need some Rademacher variables . The Rademacher average is
The random variable depends on the in a somewhat complicated way but we can see that cannot change by much if we changed the value of a single . In particular, when we flip the value of , can only change by . (The 2 comes from the Rademacher variable and the supremum-absolute value combo reflects the “maximum” change.)
We can use this bounded differences property and the Efron–Stein inequality to bound the even though we can’t say much about the behavior of , which is pretty rad.
In particular, we can say that
The inequality results from the bounded differences inequality (which we can prove using the Efron–Stein inequality).
But… we can do better.
Sneaking the supremum past the sum
Now, we define as an independent copy of . We’ll also set to be the (possibly random) index that achieves the supremum, i.e.,
So, we can bound the difference between and with the th coordinate changed from to . That is,
where in the implied inequality we square only the positive portion of .
Now, we’ll use an expectation trick and the independence of and to say
We do the expectation-conditioning combo on to get rid of all randomness expect for that of .
Finally, we use the Efron–Stein inequality to say
So, through some Efron–Stein inspired analysis, we moved the supremum outside the sum at the expense of a factor of 2.
References
This post is based on material from Chapter 3 of:
- Concentration Inequalities: A Nonasymptotic Theory of Independence by S. Boucheron, G. Lugosi, and P. Masart.
Bounded differences property
The bounded differences property states that the value of the function changes by a bounded amount if the value of the th variable is changed. A function satisfies the bounded differences property if
for and constants .