Supermartingale-based Tests#

Martingales, supermartingales, submartingales#

Let \((X_j)_{j \in \mathbb{N}} = X_1, X_2, \ldots\) and \((Y_j)_{j \in \mathbb{N}} = Y_1, Y_2, \ldots\) be sequences of random variables (stochastic processes). Let \(Y^{i} := (Y_1, \ldots, Y_i)\) be the first \(i\) elements of the sequence \((Y_j)_{j \in \mathbb{N}}\). Suppose \(\mathbb{E}|X_j| < \infty\) for all \(j \in \mathbb{N}\).

Then \((X_j)_{j \in \mathbb{N}}\) is a martingale with respect to \((Y_j)_{j \in \mathbb{N}}\) if

()#\[\begin{equation} \mathbb{E}(X_j | Y^{j-1}) = X_{j-1}. \end{equation}\]

It is a supermartingale if

()#\[\begin{equation} \mathbb{E}(X_j | Y^{j-1}) \le X_{j-1}, \end{equation}\]

and a submartingale if

()#\[\begin{equation} \mathbb{E}(X_j | Y^{j-1}) \ge X_{j-1}. \end{equation}\]

A martingale, supermartingale, or submartingale is nonnegative if \(\mathbb{P} \{X_j \ge 0 \} = 1\) for all \(j \in \mathbb{N}\).

Terminology: A sequence \((\lambda_i)_{i \in \mathbb{N}}\) is predictable with respect to \((Y_j)_{j \in \mathbb{N}}\) if \(\lambda_j\) does not depend on \((Y_i)_{i>j}\).

Examples#

Product of independent random variables with expected value 1.#

Suppose \((Y_j)_{j \in \mathbb{N}}\) are independent and have expected value 1. Define \(X_j := \prod_{i=1}^j Y_j\), \(j \in \mathbb{N}\). Then

()#\[\begin{equation} \mathbb{E} (X_j | Y^{j-1}) = \mathbb{E} (X_{j-1}Y_j | Y^{j-1}) = X_{j-1} \mathbb{E} (Y_j | Y^{j-1}) = X_{j-1} \mathbb{E} (Y_j) = X_{j-1} \cdot 1 = X_{j-1}. \end{equation}\]

Note that if \((Z_i)\) are independent random variables, then \(\prod_{i=1}^j Z_i/\mathbb{E}Z_i\) is a martingale.

Sum of independent random variables with expected value 0.#

Suppose \((Y_j)_{j \in \mathbb{N}}\) are independent and have expected value 0. Define \(X_j := \sum_{i=1}^j Y_j\), \(j \in \mathbb{N}\). Then

()#\[\begin{equation} \mathbb{E} (X_j | Y^{j-1}) = \mathbb{E} (X_{j-1}+Y_j | Y^{j-1}) = X_{j-1} + \mathbb{E} (Y_j | Y^{j-1}) = X_{j-1} + \mathbb{E} (Y_j ) = X_{j-1} + 0 = X_{j-1}. \end{equation}\]

Gambler’s fortune in repeated bets on independent fair coin tosses.#

Suppose we bet on the outcomes of a sequence of independent fair coin tosses \(Y_1, Y_2, \ldots\), i.e., \(\{Y_j\}_{j \in \mathbb{N}}\) are IID \(\mbox{Bernoulli}(1/2)\). Our initial fortune \(X_0\); our fortune after the \(j\)th wager is \(X_j\). We are not allowed to wager more than our current fortune on the next bet. If \(X_i\) is zero then \(X_k = 0\) for all \(k > i\): we can’t bet if we go broke. The wager on the \(j\) toss is \(b_j \in [0, X_{j-1}]\); \(b_j\) may depend on \(Y^{j-1}\). If \(Y_j=1\), we win the \(j\)th bet and our fortune increases by \(b_j\); if \(Y_j=0\), we lose the \(j\)th bet and our fortune decreases by \(b_j\). Then \(X_j = X_{j-1}+b_jY_j - b_j(1-Y_j)\).

()#\[\begin{equation} \mathbb{E} (X_j | Y^{j-1}) = X_{j-1}+ b_j \mathbb{E} (2Y_j-1) = X_{j-1}. \end{equation}\]

Thus the fortune \((X_j)_{j \in \mathbb{N}}\) is a nonnegative martingale with respect to the Bernoulli coin tosses \((Y_j)_{j \in \mathbb{N}}\).

Likelihood ratios#

We have a family \(\mathcal{P} := \{ \mathbb{P}_\theta : \theta \in \Theta \}\) of probability distributions on a measurable space \(\mathcal{X}\), dominated by a common \(\sigma\)-finite measure \(\mu\). Let \(p_\theta(x)\), \(x \in \mathcal{X}\) denote the density of \(P_\theta\) with respect to \(\mu\). For fixed \(x \in \mathcal{X}\), the function

()#\[\begin{eqnarray} \mathcal{L} &=& \mathcal{L}_x : \Theta \mapsto \Re^+ \\ \theta &\rightarrow& p_\theta(x) \end{eqnarray}\]

is the likelihood. If we observe \(X=x\), then for \(\eta \in \Theta\), \(\mathcal{L}(\eta) = \mathcal{L}(\eta | X=x)\) is the _likelihood of \(\eta\) given \(X=x\). The likelihood ratio of \(\eta\) to \(\nu\) given \(X=x\) is \(\mathcal{L}(\eta | X=x)/\mathcal{L}(\nu | X=x)\).

Suppose \(\mathbb{P}_1\) and \(\mathbb{P}_2\) are distributions on some outcome space and that they have densities with respect to some dominating measure \(\mu\). Let \(f\) and \(g\) denote those densities. Suppose \((Y_j)_{j \in \mathbb{N}}\) are IID with density \(f\). Define the likelihood ratio process

()#\[\begin{equation} L_j := \prod_{i=1}^j g(Y_i)/f(Y_i), \;\; j \in \mathbb{N}. \end{equation}\]

Now

()#\[\begin{equation} \mathbb{E} (L_j | Y^{j-1}) = L_{j-1} \mathbb{E}(g(Y_j)/f(Y_j)) = L_{j-1} \int_\Omega \frac{g(\omega)}{f(\omega)} f(\omega) d\mu(\omega) = L_{j-1} \int_\Omega g(\omega) d\mu(\omega) = L_{j-1} \end{equation}\]

since \(g\) is a probability density with respect to \(\mu\).

Prior-posterior ratio (PPR) martingale#

See Waudby-Smith and Ramdas (2021).

Again, we have a family of distributions Let \(\Theta\) be countable and suppose \(\{\mathbb{P}_\theta : \theta \in \Theta\}\) are dominated by a common distribution \(\mu\), where \(p_\theta\) is the density of \(\mathbb{P}_\theta\) with respect to \(\mu\). Suppose \(\Theta\) is measurable and let \(\pi\) (the prior) be a probability distribution on \(\Theta\) that assigns positive probability to every \(\eta \in \Theta\). We observe \((X_i)_{i \in \mathbb{N}}\) sequentially, with \(X_1 \sim p_\theta\) and \(X_{j+1} \sim p_\theta(x | X^j)\), for all \(j \in \mathbb{N}\), for some fixed \(\theta \in \Theta\). The predictive distribution of \(x^j = (x_1, \ldots, x_j)\) is

()#\[\begin{equation} m(x^j) := \int_\eta p_\eta(x^j) \pi(d\eta). \end{equation}\]

The posterior distribution of \(\theta\) given \(X^j=x^j\) is

()#\[\begin{equation} \pi_j(\theta) = \pi(d\theta | X^j) := \frac{p_\theta(x) \pi(\theta)}{m(x^j)}. \end{equation}\]

Consider the sequence

()#\[\begin{equation} Y_j(\theta) := \frac{\pi(\theta)}{\pi_j(\theta)}, \;\; j \in \mathbb{N}. \end{equation}\]

Then \(Y_j\) is nonnegative. Suppose the true value of \(\theta\) is \(\mu\) and that \(\pi\) assigns positive mass to \(\mu\). Then \((Y_j(\mu))\) is a nonnegative martingale with expected value 1.

Proof: [To do]

General betting martingales#

Waudby-Smith and Ramdas (2021)

()#\[\begin{equation} Y_j(\eta) := \prod_{i=1}^j (1 + \lambda_i(\eta) ) (X_i-\eta) \end{equation}\]

with \(Y_0 := 1\), \(\lambda_i \in [-1/(1-\eta), 1/\eta]\) predictable.

Initial fortune of 1. Bet on the value of \(\theta_i = \mathbb{E}X_i\). Positive \(\lambda_i\) pays if \(X_i > \eta\); negative \(\lambda_i\) pays if \(X_i < \eta\). The constraint on \(\lambda\) ensures the gambler can’t go into debt: the fortune is nonnegative.

Waudby-Smith and Ramdas Proposition 3:
For any arbitrary set of (possibly unbounded) distributions \(\Theta\), every \(\theta\)-nonnegative martingale is necessarily of the form

()#\[\begin{equation} Y_j = \prod_{i=1}^j (1 + \lambda_i Z_i) \end{equation}\]

for some predictable \(\lambda_i \le 1\) and some \(Z_i \ge -1\), \(\mathbb{P}_\theta\)-almost surely, and \(\mathbb{E}_{\theta}(Z_i | X^{i-1}) = 0\) for every \(\theta \in \Theta\). The same statement also holds for nonnegative S-supermartingales, with the = 0 replaced by ≤ 0.

The empirical Bernstein martingale#

Howard, S.R. Howard, A. Ramdas, J. McAuliffe, and J. Sekhon, 2021. Time-uniform, nonparametric, nonasymptotic confidence sequences, The Annals of Statistics, 49(2), 1055-1080.

Sampling with or without replacement from a finite population#

Suppose we have a population \(\{x_i\}_{i=1}^N\). Let \(\theta = \bar{x} := \frac{1}{N} \sum_{i=1}^N x_i\) be the population mean. Let \(X_k\) be the value of \(x_i\) selected on the \(k\)th draw. Let \(X^j := (X_1, \ldots, X_j)\). Assume \(X_j \in [0, u_j]\) for some known \(u_j\).

Let \(\theta_j := \mathbb{E}(X_j | X^{j-1})\). For sampling with replacement, \(\theta_j = \theta\). Define \(S_j := \sum_{i=1}^j X_j\). For sampling without replacement, \(\theta_j = (N\theta - S_{j-1})/(N-j)\): the population total was originally \(N\theta\), from which items totalling \(S_{j-1}\) have been removed, leaving \((N-j)\) items in the population.

Then \(\sum_{i=1}^j (X_i-\theta_i)\) and \(\prod_{i=1}^j X_i/\theta_i\) (provided \(\theta_i \ne 0\)) are martingales.

Properties of martingales, supermartingales, and submartingales#

  • If \((X_j)_{j \in \mathbb{N}}\) is a martingale, then \(\mathbb{E} X_j = \mathbb{E} X_1\).

  • If \((X_j)_{j \in \mathbb{N}}\) is a supermartingale, then \(\mathbb{E} X_j \le \mathbb{E} X_1\).

  • If \((X_j)_{j \in \mathbb{N}}\) is a submartingale, then \(\mathbb{E} X_j \ge \mathbb{E} X_1\).

Proof. Suppose \((X_j)_{j \in \mathbb{N}}\) is a martingale w.r.t. \((Y_j)_{j \in \mathbb{N}}\). Then, by the law of total expectation,

()#\[\begin{equation} \mathbb{E}X_j = \mathbb{E}\mathbb{E}(X_j | Y^{j-1}) = \mathbb{E} X_{j-1}. \end{equation}\]
()#\[\begin{equation} \mathbb{E}X_{j-1} = \mathbb{E}\mathbb{E}(X_{j-1} | Y^{j-2}) = \mathbb{E} X_{j-2}. \end{equation}\]

Continuing, we end up with \(\mathbb{E}X_1\).

For supermartingales and submartingales, \(=\) is replaced with \(\le\) or \(\ge\), respectively. \(\Box\).

  • A concave (convex) function of a martingale is a super- (sub-)submartingale. (Proof by Jensen’s inequality.)

Stopping times#

Let \(\tau\) be a measurable mapping from \(X_1, X_2, \ldots\) to \(\mathbb{N} \cup \infty\) and suppose that for \(j=1, 2, \ldots\), the event \(\tau = j\) depends only on \(X^j = (X_1, \ldots, X_j)\), and not on \(X_k\) for \(k>j\). (The usual definition of stopping times involves filtrations, which are beyond the scope of these materials.) Then \(\tau\) is a stopping time for \(X\).

In the example where \(X_j\) represents a sequence of wagers, some stopping rules are:

  • stop after 10 bets

  • stop when you go broke

  • stop when you double your initial stake

  • stop if you have won three bets in a row

These are not stopping rules:

  • stop when your fortune is as high as it will ever get

  • stop if you would lose the next three bets

If \(X_1, X_2, \ldots\) is a (super | sub) martingale and \(\tau\)

Doob’s Optional Stopping Theorem (1953)#

Suppose \(\tau\) is a stopping time for the stochastic process \((X_j)_{j \in \mathbb{N}}\) and that any of the following conditions holds:

  • \(\mathbb{P} \{\tau \le c \} = 1\) for some \(c \in \mathbb{N}\)

  • \(\mathbb{E} \tau < \infty\) and \( | X_j - X_{j-1}| \le c\) for all \(j\) and some \(c\)

  • \(|X_{j}| \le c \) for all \(j \in \mathbb{N}\) for some \(c\) and \(\mathbb{P} \{ \tau < \infty \}=1\).

Then if \((X_j)_{j \in \mathbb{N}}\) is a martingale, \(\mathbb{E} X_\tau = \mathbb{E} X_1\); if it is a supermartingale, \(\mathbb{E} X_\tau \le \mathbb{E} X_1\).

Doob’s Martingale Convergence Theorem#

Suppose \((X_j){j \in \mathbb{N}}\) is a martingale and \(\sup_{j \in \mathbb{N}} \mathbb{E} (-X_j \wedge 0) < \infty\). Then \((X_j)\) converges almost surely to a random variable \(X_\infty\) with \(\mathbb{E} X_\infty < \infty\).

Ville’s Inequality (1939)#

Let \((X_j)_{j \in \mathbb{N}}\) be a nonnegative (super)martingale with respect to \((Y_j)_{j \in \mathbb{N}}\). Then

()#\[\begin{equation} \mathbb{P} \{ \exists j : X_j \ge c \mathbb{E} X_1 \} \le 1/c. \end{equation}\]

Proof. (informal: can be made rigorous using Doob’s martingale convergence theorem)

Define the stopping time \(\tau(k)\) to be the first \(j \le \) such that \(X_j \ge x > 0\), or \(k\) otherwise. For each \(k\), \(X_{\tau(k)}\) is a stopping time, so \(\mathbb{E} X_{\tau(k)} = \mathbb{E} X_1\). Apply Markov’s inequality to \(X_{\tau(k)}\) and let \(k \rightarrow \infty\).

The Azuma-Hoeffding inequality#

Let \((X_j)_{j \in \mathbb{N}}\) be a supermartingale and suppose that there are finite constants \(\{c_k\}_{k \in \mathbb{N}\) such that for all \(k\),

()#\[\begin{equation} \mathbb{P} \{|X_k-X_{k-1}| \leq c_k \} = 1. \end{equation}\]

Then for all \(t \in \mathbb{N}\) and all \(\epsilon > 0\),

()#\[\begin{equation} \mathbb{P} \{ X_t-X_0 \ge \epsilon \} \le \exp \left( - \frac{2\epsilon^2}{\sum _{j=1}^t c_j^2 }\right). \end{equation}\]

Supermartingale tests#

Wald’s SPRT#

See SPRT

ALPHA martingales#

Analogous to the SPRT for the Bernoulli \(p\).

Bernoulli SPRT: Observe \(\{X_j \}_{j \in \mathbb{N}}\) IID Bernoulli(\(p\)). The null hypothesis is that \(p = \mu\); alternative hypothesis is that \(p = \eta\).

\[T_0 := 1\]
\[T_j := T_{j-1} \left (X_j \frac{\eta}{\mu} + (1- X_j) \frac{1-\eta}{1-\mu} \right ).\]

Claim: if \(p \le \mu\), \((T_j)_{j\in\mathbb{N}}\) is a nonnegative supermartingale.

We will prove something more general. We start by developing a one-sided test of the simple hypothesis \(\theta = \mu\), then show that the \(P\)-value is monotone in \(\mu\), so the test is valid for the composite hypothesis \(\theta \le \mu\). Let \(X^j := (X_1, \ldots, X_j)\). Assume \(X_i \in [0, u]\) for some known \(u\).

Let \(\mu_j := \mathbb{E}(X_j | X^{j-1})\) computed under the null hypothesis \(\theta = \mu\). Let \(\eta_j = \eta_j(X^{j-1})\), \(j=1, \ldots\), be a predictable sequence in the sense that \(\eta_j\) may depend on \(X^{j-1}\), but not on \(X_k\) for \(k \ge j\).

Let \(T_0 := 1\) and

()#\[\begin{equation} T_j := T_{j-1} u^{-1}\left ( X_j\frac{\eta_j}{\mu_j} + (u-X_j) \frac{u-\eta_j}{u-\mu_j} \right ), \;\; j=1, \ldots . \label{eq:alphaBRAVOdef} \end{equation}\]

This can be rearranged to yield

()#\[\begin{equation} T_j := T_{j-1} \left ( \frac{X_j}{\mu_j} \cdot \frac{\eta_j-\mu_j}{u-\mu_j} + \frac{u-\eta_j}{u-\mu_j} \right ). \end{equation}\]

Equivalently,

()#\[\begin{equation} T_j := \prod_{i=1}^j \left ( \frac{X_i}{\mu_i} \cdot \frac{\eta_i-\mu_i}{u-\mu_i} + \frac{u-\eta_i}{u-\mu_i} \right ), \;\; j \ge 1. \end{equation}\]

Under the null hypothesis that \(\theta_j = \mu_j\), \(T_j\) is nonnegative since \(X_j\), \(\mu_j\), and \(\eta_j\) are all in \([0, u]\). Also,

()#\[\begin{eqnarray} \mathbb{E} (T_j | X^{j-1} ) &=& T_{j-1} \left ( \frac{\mu_j}{\mu_j} \cdot \frac{\eta_j-\mu_j}{u-\mu_j} + \frac{u-\eta_j}{u-\mu_j} \right ) \nonumber \\ &=& T_{j-1} \left ( \frac{\eta_j-\mu_j}{u-\mu_j} + \frac{u-\eta_j}{u-\mu_j} \right ) \nonumber \\ &=& T_{j-1}. \end{eqnarray}\]

Thus if \(\theta = \mu\), \((T_j)_{j \in \mathbb{N}}\) is a nonnegative martingale with respect to \((X_j)_{j \in \mathbb{N}}\), starting at \(1\).

If \(\theta < \mu\), then \(\mathbb{E} (X_j | X^{j-1}) < \mu_j\) and \(r_j = \frac{\mathbb{E} (X_j | X^{j-1})}{\mu_j} < 1\), so

()#\[\begin{equation} \label{eq:supermartingale} \mathbb{E} (T_j | X^{j-1} ) = T_{j-1} \left ( r_j \cdot \frac{\eta_j-\mu_j}{u-\mu_j} + \frac{u-\eta_j}{u-\mu_j} \right ) < T_{j-1}. \end{equation}\]

Thus \((T_j)\) is a nonnegative supermartingale starting at 1 if \(\theta \le \mu\). It follows from Ville’s inequality that if \(\theta \le \mu\),

()#\[\begin{equation} \label{eq:p-value-adapt} \Pr \{ \exists j : T_j \ge \alpha^{-1} \} \le \alpha. \end{equation}\]

That is, \(\min(1, 1/T_j)\) is an “anytime \(P\)-value” for the composite null hypothesis \(\theta \le \mu\).

Note that the derivation did not use any information about \(\{x_i\}\) other than \(x_i \in [0, u]\): it applies to populations \(\{x_i\}\) that are nonnegative and bounded, not merely binary populations.

Sampling without replacement#

To use ALPHA with a sample drawn without replacement, we need \(\mathbb{E}(X_j | X^{j-1})\) computed on the assumption that \(\theta := \frac{1}{N} \sum_{i=1}^N x_i = \mu\). For sampling without replacement from a population with mean \(\mu\), after draw \(j-1\), the mean of the remaining numbers is \((N\mu - \sum_{k=1}^{j-1}X_k)/(N-j+1)\). Thus the conditional expectation of \(X_j\) given \(X^{j-1}\) under the null is \((N\mu - \sum_{k=1}^{j-1}X_k)/(N-j+1)\). If \(N\mu - \sum_{k=1}^{j-1}X_k < 0\) for any \(k\), the null hypothesis \(\theta = \mu\) is certainly false.

The Empirical Bernstein martingale test#

See Howard et al. (2021), Theorem 4.1, and Waudby-Smith and Ramdas (2021).

Suppose \(X_t \in [a, b]\) almost surely for all \(t\). Define

()#\[\begin{equation} \bar{X}_t := \frac{1}{t}\sum_{i=1}^t X_i \end{equation}\]

and

()#\[\begin{equation} \mu_t := \frac{1}{t} \sum_{i=1}^t \mathbb{E}(X_i| X^{i-1}). \end{equation}\]

Let \((\hat{X}_t)\) be any \([a, b]\)-valued predictable sequence, and let \(\omega > 0\) be any sub-exponential uniform boundary with crossing probability \(\alpha\) for scale \(c = b − a\).

MPrPl-EB t (m) := Yt i=1 \exp {\lambda i(Xi − m) − vi\psi E(\lambda i)} (13) where, following Howard et al. [38, 39], we have defined vi := 4(Xi − \mu bi−1) 2 and \psi E(\lambda ) := (− \lm(1 − \lambda) − \lambda )/4 for \(\lambda \in [0, 1)\).

Using MPrPl-EB t (m) in Step (b) in Theorem 1 — and applying the same procedure but with (Xt)∞ t=1 and m replaced by (−Xt)\infty t=1 and −m combined with a union bound over the resulting CSs — we get the following CS. Theorem 2 (Predictable plug-in empirical Bernstein CS [PrPl-EB]). Suppose (Xt)∞ t=1 ∼ P for some P ∈ Pµ. For any (0, 1)-valued predictable (λt)∞ t=1, C PrPl-EB t := Pt i=1 λiXi Pt i=1 λi ± log(2/α) + Pt i=1 viψE(λi) Pt i=1 λi ! forms a (1 − α)-CS for µ, as does its running intersection, T i≤t C PrPl-EB i . In particular, we recommend the predictable plug-in (λ PrPl-EB t )∞ t=1 given by λ PrPl-EB t := s 2 log(2/α) σb 2 t−1 tlog(1 + t) ∧ c, σb 2 t := 1 4 + Pt i=1(Xi − µbi) 2 t + 1 , µbt := 1 2 + Pt i=1 Xi

import math
import scipy as sp
from scipy.stats import bernoulli, uniform
import numpy as np
from numpy.testing import assert_allclose
import json
np.random.seed(123456789)
!pip install 
# !pip install rilacs
from rilacs.strategies import linear_gamma_dist
import pytest
from rilacs.martingales import (
    apriori_Kelly_martingale,
    distKelly_martingale,
    sqKelly_martingale,
    dKelly_martingale,
)
import itertools
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/var/folders/70/k4zn0z5907g_7ftxdq3dxm_h0000gn/T/ipykernel_48878/3478796715.py in <module>
      2 from rilacs.strategies import linear_gamma_dist
      3 import pytest
----> 4 from rilacs.martingales import (
      5     apriori_Kelly_martingale,
      6     distKelly_martingale,

/opt/anaconda3/lib/python3.8/site-packages/rilacs/martingales.py in <module>
      1 from typing import Callable
      2 import numpy as np
----> 3 from confseq.betting import (
      4     betting_mart,
      5     diversified_betting_mart,

/opt/anaconda3/lib/python3.8/site-packages/confseq/betting.py in <module>
      7 from copy import copy, deepcopy
      8 from logging import info
----> 9 from confseq.misc import get_running_intersection, get_ci_seq
     10 
     11 from confseq.predmix import lambda_predmix_eb

/opt/anaconda3/lib/python3.8/site-packages/confseq/misc.py in <module>
      1 import numpy as np
----> 2 from numpy.typing import NDArray
      3 from typing import Callable, Tuple
      4 import multiprocess
      5 

ImportError: cannot import name 'NDArray' from 'numpy.typing' (/opt/anaconda3/lib/python3.8/site-packages/numpy/typing/__init__.py)
def sprt_mart(x : np.array, N : int, mu : float=1/2, eta: float=1-np.finfo(float).eps, \
              u: float=1, random_order = True):
    '''
    Finds the p value for the hypothesis that the population 
    mean is less than or equal to mu against the alternative that it is eta,
    for a population of size N of values in the interval [0, u].
    
    Generalizes Wald's SPRT for the Bernoulli to sampling without replacement and to bounded
    values rather than binary values.

    If N is finite, assumes the sample is drawn without replacement
    If N is infinite, assumes the sample is with replacement
    
    Data are assumed to be in random order. If not, the calculation for sampling without replacement is incorrect.


    
    Parameters:
    -----------
    x : binary list, one element per draw. A list element is 1 if the 
        the corresponding trial was a success
    N : int
        population size for sampling without replacement, or np.infinity for 
        sampling with replacement
    theta : float in (0,u)
        hypothesized population mean
    eta : float in (0,u)
        alternative hypothesized population mean
    random_order : Boolean
        if the data are in random order, setting this to True can improve the power.
        If the data are not in random order, set to False
    '''
    if any((xx < 0 or xx > u) for xx in x):
        raise ValueError(f'Data out of range [0,{u}]')
    if np.isfinite(N):
        if not random_order:
            raise ValueError("data must be in random order for samples without replacement")
        S = np.insert(np.cumsum(x),0,0)[0:-1]  # 0, x_1, x_1+x_2, ...,  
        j = np.arange(1,len(x)+1)              # 1, 2, 3, ..., len(x)
        m = (N*mu-S)/(N-j+1)                   # mean of population after (j-1)st draw, if null is true
    else:
        m = mu
    with np.errstate(divide='ignore',invalid='ignore'): 
        terms = np.cumprod((x*eta/m + (u-x)*(u-eta)/(u-m))/u) # generalization of Bernoulli SPRT
    terms[m<0] = np.inf                        # the null is surely false
    return terms
def shrink_trunc(x: np.array, N: int, mu: float=1/2, nu: float=1-np.finfo(float).eps, u: float=1, c: float=1/2, 
                 d: float=100) -> np.array: 
    '''
    apply the shrinkage and truncation estimator to an array
    
    sample mean is shrunk towards nu, with relative weight d compared to a single observation.
    estimate is truncated above at u-u*eps and below at mu_j+e_j(c,j)
    
    S_1 = 0
    S_j = \sum_{i=1}^{j-1} x_i, j > 1
    m_j = (N*mu-S_j)/(N-j+1) if np.isfinite(N) else mu
    e_j = c/sqrt(d+j-1)
    eta_j =  ( (d*nu + S_j)/(d+j-1) \vee (m_j+e_j) ) \wedge u*(1-eps)
    
    Parameters
    ----------
    x : np.array
        input data       
    mu : float in (0, 1)
        hypothesized population mean
    eta : float in (t, 1)
        initial alternative hypothethesized value for the population mean
    c : positive float
        scale factor for allowing the estimated mean to approach t from above
    d : positive float
        relative weight of nu compared to an observation, in updating the alternative for each term
    '''
    S = np.insert(np.cumsum(x),0,0)[0:-1]  # 0, x_1, x_1+x_2, ...,  
    j = np.arange(1,len(x)+1)              # 1, 2, 3, ..., len(x)
    m = (N*mu-S)/(N-j+1) if np.isfinite(N) else mu   # mean of population after (j-1)st draw, if null is true 
    return np.minimum(u*(1-np.finfo(float).eps), np.maximum((d*nu+S)/(d+j-1),m+c/np.sqrt(d+j-1)))
def test_shrink_trunc():
    epsj = lambda c, d, j: c/math.sqrt(d+j-1)
    Sj = lambda x, j: 0 if j==1 else np.sum(x[0:j-1])
    muj = lambda N, mu, x, j: (N*mu - Sj(x, j))/(N-j+1) if np.isfinite(N) else mu
    nus = [.51, .55, .6]
    mu = 1/2
    u = 1
    d = 10
    vrand =  sp.stats.bernoulli.rvs(1/2, size=20)
    v = [
        np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]),
        np.array([1, 1, 1, 1, 1, 1, 0, 0, 0, 0]),
        vrand
    ]
    for nu in nus:
        c = (nu-mu)/2
        for x in v:
            N = len(x)
            xinf = shrink_trunc(x, np.inf, mu, nu, c=c, d=d)
            xfin = shrink_trunc(x, len(x), mu, nu, c=c, d=d)
            yinf = np.zeros(len(x))
            yfin = np.zeros(len(x))
            for j in range(1,len(x)+1):
                est = (d*nu + Sj(x,j))/(d+j-1)
                most = u*(1-np.finfo(float).eps)
                yinf[j-1] = np.minimum(np.maximum(mu+epsj(c,d,j), est), most)
                yfin[j-1] = np.minimum(np.maximum(muj(N,mu,x,j)+epsj(c,d,j), est), most)
            np.testing.assert_allclose(xinf, yinf)    
            np.testing.assert_allclose(xfin, yfin)    
    
test_shrink_trunc()
def alpha_mart(x: np.array, N: int, mu: float=1/2, eta: float=1-np.finfo(float).eps, u: float=1, \
               estim: callable=shrink_trunc) -> np.array :
    '''
    Finds the ALPHA martingale for the hypothesis that the population 
    mean is less than or equal to t using a martingale method,
    for a population of size N, based on a series of draws x.
    
    The draws must be in random order, or the sequence is not a martingale under the null
    
    If N is finite, assumes the sample is drawn without replacement
    If N is infinite, assumes the sample is with replacement

    Parameters
    ----------
    x : list corresponding to the data
    N : int
        population size for sampling without replacement, or np.infinity for sampling with replacement
    mu : float in (0,1)
        hypothesized fraction of ones in the population
    eta : float in (t,1) 
        alternative hypothesized population mean
    estim : callable
        estim(x, N, mu, eta, u) -> np.array of length len(x), the sequence of values of eta_j for ALPHA
               
    Returns
    -------   
    terms : array
        sequence of terms that would be a nonnegative martingale under the null
    '''
    S = np.insert(np.cumsum(x),0,0)[0:-1]  # 0, x_1, x_1+x_2, ...,  
    j = np.arange(1,len(x)+1)              # 1, 2, 3, ..., len(x)
    m = (N*mu-S)/(N-j+1) if np.isfinite(N) else mu   # mean of population after (j-1)st draw, if null is true 
    etaj = estim(x, N, mu, eta, u) 
    with np.errstate(divide='ignore',invalid='ignore'):
        terms = np.cumprod((x*etaj/m + (1-x)*(u-etaj)/(u-m))/u)
    terms[m<0] = np.inf
    return terms