Inequalities and Identities#

import sympy
from sympy import symbols
sympy.init_printing(use_unicode=True)

General-purpose Inequalities#

The Arithmetic-Geometric Mean Inequality#

Let \(\{ a_j \}_{j=1}^n\) and \(\{q_j\}_{j=1}^n\) be sets of nonnegative numbers with \(\sum_j q_j = 1\). Then

\[\begin{equation*} \prod_{j=1}^n a_j^{q_j} < \sum_{j=1}^n q_j a_j. \end{equation*}\]

The Cauchy-Schwarz Inequality#

Let \(\mathcal{X}\) be a real linear vector space, and let \(x, y \in \mathcal{X}\). Then

\[\begin{equation*} | \langle x, y \rangle|^2 \le \langle x, x \rangle \langle y, y \rangle. \end{equation*}\]

Rearrangement Inequalities#

Rearrangements of sequences#

Let \(x \equiv (x_j)_{j=1}^n \in {\mathbb R}^n\). The decreasing rearrangement of \(x\), denoted \(x^*\), has the same multiset of component values as \(x\), but ordered so that

\[\begin{equation*} x_1^* \ge x_2^* \ge \cdots \ge x_n^*. \end{equation*}\]

The increasing rearrangmement of \(x\), denoted \(x_*\), has the same multiset of component values as \(x\), but ordered so that

\[\begin{equation*} x_{*1} \le x_{*2} \le \cdots \le x_{*n}. \end{equation*}\]

The Hardy-Littlewood Rearrangement Inequality for two vectors#

Suppose \(x \in \mathbb{R}^n\) and \(y \in \mathbb{R}^n\). Let \(x \cdot y \equiv \sum_{j=1}^n x_j y_j\). Then

\[\begin{equation*} x^* \cdot y_* \le x \cdot y \le x^* \cdot y^*. \end{equation*}\]

The Symmetric Decreasing Rearrangement of a function#

Suppose \(f: \mathbb{R}^n \rightarrow \mathbb{R}^+\) has a finite integral. The symmetric decreasing rearrangement \(f^*\) of \(f\) is a function such that:

  • \(f(x) = f(-x)\) (symmetry)

  • \(f(x) \ge f(y)\) if \(y \ge x \ge 0\) (decreasing)

  • \(\int_{\mathbb{R}^n} 1_{f(x) \ge t} dx = \int_{\mathbb{R}^n} 1_{f^*(x) \ge t} dx\) for all \(t \ge 0\) (the measure of the level sets is the same)

The Hardy-Littlewood Rearrangement Inequality for two functions#

Suppose \(f: \mathbb{R}^{n} \to \mathbb{R}^{+}\) and \(g: \mathbb{R}^{n} \to \mathbb{R}^{+}\). Then

\[\begin{equation*} \int_{\mathbb{R}^n} f(x) g(x) dx \le \int_{\mathbb{R}^n} f^*(x) g^*(x) dx. \end{equation*}\]

The Riesz Rearrangement Inequality for three functions#

Suppose \(f: \mathbb{R}^{n} \to \mathbb{R}^{+}\), \(g: \mathbb{R}^{n} \to \mathbb{R}^{+}\), and \(g: \mathbb{R}^{n} \to \mathbb{R}^{+}\).

Then

\[\begin{equation*} \int f(x)g(x-y)h(y)\,dx\,dy \le \int f^{*}(x)g^{*}(x-y)h^{*}(y)\,dx\,dy. \end{equation*}\]

Stirling’s inequalities#

()#\[\begin{equation} e n^{n+1/2} e^{-n} \ge n! \ge \sqrt{2 \pi} n^{n+1/2} e^{-n}. \end{equation}\]

For \(\ell \ge 1\) and \(m \ge 2\),

()#\[\begin{equation} { {\ell m } \choose { \ell }} \ge \frac{m^{m(\ell-1)+1}}{\sqrt{\ell} (m-1)^{(m-1)(\ell-1)}}. \end{equation}\]

Entropy bounds on combinations#

()#\[\begin{equation} \frac{2^{nH(k/n)}}{n+1} \le {n \choose k} \le 2^{nH(k/n)}, \end{equation}\]

where \(H(q) \equiv -q \log_2(q) - (1-q) \log_2 (1-q)\).

Identities involving conditional expectation#

Law of Iterated Expectation (Adam’s Law, Law of Total Expectation) #

If \(X\) and \(Y\) are random variables with a joint distribution,

\[\begin{equation*} \mathbb{E} X = \mathbb{E}_Y \mathbb{E}(X|Y). \end{equation*}\]

Law of Total Variance (Eve’s Law)#

If \(X\) and \(Y\) are random variables with a joint distribution,

\[\begin{equation*} Var(X) = \mathbb{E}_Y Var(X|Y) + Var \mathbb{E} (X|Y). \end{equation*}\]

Probability Inequalities#

This section follows Lugosi (2006) closely.

The tail-sum formula#

If \(X\) is a nonnegative real-valued random variable,

\[\begin{equation*} {\mathbb E} X = \int_0^\infty {\mathbb{P}} \{X \ge x\} dx \end{equation*}\]

Jensen’s Inequality#

If \(\phi\) is a convex function from \({\mathcal X}\) to \(\mathbb{R}\), then \(\phi({\mathbb E} X) \le {\mathbb E} \phi(X)\).

Chernoff bounds#

If \(X\) has a moment generating function (if \({\mathbb{E}} e^{sX}\) exists for \(s \in [-a, a]\) for some \(a > 0\)), we can apply the generalized Markov Inequality with \(f(x) = e^{sx}\) for positive \(s\):

\[\begin{equation*} {\mathbb{P}} \{ X \ge t \} = {\mathbb{P}} \{ e^{sX} \ge e^{st} \} \le \frac{{\mathbb E} e^{sX}}{e^{st}} \end{equation*}\]

for all \(s \in [-a, a]\). For any particular \(X\), one can optimize this over \(s\):

\[\begin{equation*} {\mathbb{P}} \{ X \ge t \} = {\mathbb{P}} \{ e^{sX} \ge e^{st} \} \le \inf_{s \in [-a, a]} \frac{{\mathbb E} e^{sX}}{e^{st}}. \end{equation*}\]

Hoeffding’s Inequality#

Let \(\{ X_j \}_{j=1}^n\) be independent, and define \(S_n \equiv \sum_{j=1}^n X_j\). Then, applying the Chernoff bound gives

\[\begin{equation*} {\mathbb{P}} \{ S_n - {\mathbb E} S_n \ge t \} \le e^{-st} {\mathbb E} e^{sS_n} = e^{-st} \prod_{j=1}^n e^{s(X_j - E X_j)}. \end{equation*}\]

Hoeffding bounds the moment generating function for a bounded random variable \(X\): If \(\mathbb{P} (X \in [a, b]) = 1\), then

\[\begin{equation*} {\mathbb E} e^{sX} \le e^{s^2 (b-a)^2/8}, \end{equation*}\]

from which follows Hoeffding’s tail bound:

If \(\{X_j\}_{j=1}^n\) are independent and \({\mathbb{P}} \{a_j \le X_j \le b_j\} = 1\), then

\[\begin{equation*} {\mathbb{P}} \{ S_n - {\mathbb E} S_n \ge t \} \le e^{-2t^2/\sum_{j=1}^n (b_j - a_j)^2} \end{equation*}\]

and

\[\begin{equation*} {\mathbb{P}} \{ S_n - {\mathbb E} S_n \le -t \} \le e^{-2t^2/\sum_{j=1}^n (b_j - a_j)^2} \end{equation*}\]

Hoeffding’s Other Inequality#

Suppose \(f\) is a convex, continuous, real function and \({\mathcal X}\) is a finite set. Let \(\{X_j \}_{j=1}^n\) be a simple random sample from \({\mathcal X}\) and let \(\{Y_j \}_{j=1}^n\) be an iid uniform random sample (with replacement) from \({\mathcal X}\). Then

\[\begin{equation*} {\mathbb E} f \left ( \sum_{j=1}^n X_j \right ) \le {\mathbb E} f \left ( \sum_{j=1}^n Y_j \right ). \end{equation*}\]

Bernstein’s Inequality#

There are many Bernstein Inequalities, which are derived by applying Markov’s inequality to random variables of the form \(\exp (\lambda \sum _{j=1}^n X_j )\) for suitable \(\lambda\). Chernoff bounds, Hoeffding’s inequality, and Azuma’s inequality are special cases.

Here are two of Bernstein’s inequalities.

  1. Suppose \(\{X_j \}_{j=1}^n\) are independent with \({\mathbb E} X_j = 0\) for all \(j\), \({\mathbb{P}} \{ | X_j | \le c\} = 1\), \(\sigma_j^2 = {\mathbb E} X_j^2\) and \(\sigma = \frac{1}{n} \sum_{j=1}^n \sigma_j^2\). Then for any \(\epsilon > 0\),

\[\begin{equation*} {\mathbb{P}} \{ S_n/n > \epsilon \} \le e^{-n \epsilon^2 / 2(\sigma^2 + c\epsilon/3)}. \end{equation*}\]
  1. \(\{X_j \}_{j=1}^n\) are independent with \({\mathbb E} X_j = 0\) for all \(j\). Suppose that for some \(C>0\) and every integer \(k \ge 2\),

()#\[\begin{equation} \mathbb{E} \left| X_j^k \right| \le \frac{C^{k-2} k!}{2} \mathbb{E} X_i^{2}. \end{equation}\]

Then for all \(t \in \left [ 0, (2C)^{-1} \sqrt{\sum \mathbb{E} X_j^2} \right ]\),

()#\[\begin{equation} \mathbb{P} \left( \sum_{j=1}^n X_j \ge 2t \sqrt{\sum \mathbb {E} X_j^2} \right) < e^{-t^2}. \end{equation}\]

There are also empirical Bernstein bounds, which bound probabilities in terms of observed (sample) moments instead of the theoretical moments. The following example is from Maurer and Pontil, 1987 (Theorem 11):

Let \(X = (X_j)_{j=1}^n\) be a vector of independent random variables that take values in \([0, 1]\). Let \(\bar{X} := \frac{1}{n} \sum_j X_j\) and \(V_n(X) := \frac{1}{n(n-1)} \sum_{i, j = 1}^n (X_i - X_j)^2/2\). Let \(\delta > 0\). Then with probability at least \(1 − \delta\) in \(X\),

()#\[\begin{equation} \mathbb{E} \bar{X} \le \bar{X} + \sqrt{ \frac{2V_n (X) \ln 2/\delta}{n}} + \frac{7 \ln 2/\delta}{3(n − 1)}. \end{equation}\]

Martingales#

A sequence of random variables \(\{ X_j \}\) is a martingale if

  • \(\mathbb{E} |X_j| < \infty\) \(\forall j\), and

  • \(\mathbb{E} (|X_{j+1}| \mid X_j) = X_j\).

A sequence of random variables \(\{ X_j \}\) is a martingale with respect to the sequence \(\{Y_j \}\) if

  • \(\mathbb{E} |X_j| < \infty\) \(\forall j\), and

  • \(\mathbb{E} (|X_{j+1}| \mid Y_j) = X_j\).

A nonnegative martingale is a martingale for which \(\Pr \{X_j \ge 0 \} = 1\) \(\forall j\).

Examples#

  • A gambler’s fortune in a series of fair bets with the same stakes is a martingale: the expected fortune after the \(j+1\)st bet is the fortune after the \(j\)th bet.

  • Likelihood ratios: Suppose \(\{Y_j \}\) are IID with density \(f\), and let \(g\) be some other density. Define \(X_j \equiv \prod_{i=1}^j g(X_i)/f(X_i)\). Then \((X_j)\) is a martingale with respect to \(\{Y_j\}\). (More generaly, likelihood ratios are nonnegative martingales with respect to the distribution in the denominator.)

Ville’s Inequality#

If \((X_j)\) is a nonnegative supermartingale, then

\[\begin{equation*} \Pr \{ \sup_j X_j \ge a \} \le \mathbb{E} X_1/a. \end{equation*}\]

Ville’s inequality is foundational for sequential testing. See the SPRT, the SPRT without replacement, and nonparametric confidence sets.