Rademachers are rad - part III
After some ambivalence, I decided that we’ll pay homage to our Rademacher friends once again. We’re going to prove a minimax lower bound for an expert prediction problem. As we’ll see, Rademacher random variables will come to our rescue (twice, in fact).
Prediction with experts
Let’s say that we disagree with statistical learning theory on philosophical grounds, i.e., we don’t believe that our data are generated via a stochastic model that assumes independence (or exchangeability). Can we predict in the absence of a statistical assumptions? If so, how would we do it?
- Can we predict in the absence of a statistical assumptions? Yes!
- If so, how could we do it? Using the advice of experts!
The prediction with expert advice framework has the following set up:
- a decision space ;
- an outcome space ;
- a loss function ; and
- a set of expert indices ;
At each time
- the environment chooses the next outcome and the experts choose ;
- the expert advice is provided to the forecaster;
- the forecast chooses a prediction ;
- the environment reveals the next outcome ;
- the forecaster incurs a loss and each expert incurs a loss .
Minimax lower bound
We want to show that if is the absolute loss, then
where the minimax regret for rounds is defined as
with is a set of static experts, is a forecasting strategy, is a -valued convex loss function, and is the number of rounds. (A static expert is one with constant-valued predictions, i.e., they only depend on the current round.)
Proof
We’re going to lower bound a lower bound on , which may seem odd, but is a fairly common technique. (So keep it in your back pocket the next time you have to bound something.) In particular, we’re going to use the fact that a fixed class of static experts lower bounds the minimax regret as
where
Then we’ll lower bound , which implies the minimax regret bound.
First, we use the average-is-less-than-the-max trick to get
where the expectation is over the randomness in . By the definitions of the infimum and supremum, we can rewrite this guy as
Okay, cool. Now we’re going to use the fact that for all forecasting strategies because of the randomness of the . (To see this, split the absolute value into two parts, enjoy some nice cancellations and sum it all up.) We’ll use this fact and then introduce some Rademacher random variables defined as to write
Now, let’s take supremums of both sides, do another Rademacher trick, and then exploit the symmetry of Rademachers:
where we used that and .
The final step is an appeal to ye-old central limit theorem and a maximal inequality for (sub-)Gaussian random variables. The CLT says that for each
for standard normals . The maximal inequality gives
Now, we stitch everything back together and conclude the proof with
Rademacher’s to the rescue once again.
Next up
… modified logarithmic Sobolev inequalities or bandits!
References
This post is related to material from Chapters 2 and 3 of:
- Prediction, Learning, and Games by N. Cesa-Bianchi and G. Lugosi.