Simulation and permutation test P-values
Contents
Simulation and permutation test \(P\)-values#
This chapter concerns the role of simulation in permutation tests. The set-up is as follows.
There is a measurable space \(\mathcal{X}\) of possible observations. We will observe \(X = (X_j)_{j=1}^N\), where each \(X_j \in \mathcal{X}\). Let \(\mathbb{P}\) denote the joint distribution of \((X_j)\). There is a group \(\mathcal{G}\) of transformations from \(\mathcal{X}^N\) to itself, such that if the null hypothesis \(H_0\) is true, \(\mathbb{P}\) is invariant under \(\mathcal{G}\). That is, under \(H_0\), for every \(\mathbb{P}\)-measurable subset \(A \subset \mathcal{X}^N\), for every \(g \in \mathcal{G}\), \(\mathbb{P} \{ X \in A \} = \mathbb{P} \{ X \in g(A) \}\). Equivalently, for every \(X\)-measurable function \(f\), \(\mathbb{E} f(X) = \mathbb{E} f(g(X))\).
It follows that if the null \(H_0\) is true, the conditional distribution of any test statistic \(\phi(X)\) given \(X \in \mathcal{G}(x)\) is uniformly distributed on the multiset \(\phi(\mathcal{G}(x)) := \{ \phi(g(x)): g \in \mathcal{G} \}\).
Let \(x := (x_j)_{j=1}^N\) denote the observed data. For any pre-specified test statistic \(\phi(\cdot)\), we can define a (conditional) \(P\)-value as
How can we find \(P\)? Suppose we draw \(n\) elements of \(\mathcal{G}(x)\), the orbit of \(x\) under \(\mathcal{G}\), uniformly at random. (see Mathematical Fundations if terms like “orbit” are unfamiliar.) Let \(\{G_k\}_{k=1}^K\) denote IID random elements of \(\mathcal{G}\) and \(Y_k = G_k(x)\), \(k=1, \ldots, K\) denote \(K\) random elements of the orbit of \(x\) under \(\mathcal{G}\).
An unbiased estimate of \(P\) (assuming that the random elements are generated uniformly at random–see Algorithms for Pseudo-Random Samples and Permutations for a caveats), is
Conditional on \(X \in \mathcal{G}(x)\), the events \(\{\phi(G_j(x)) \ge \phi(x)\}\) are IID with probability \(P\) of occurring, and \(\hat{P}\) is an unbiased estimate of \(P\).
Another estimate of \(P\) that can also be interpreted as an exact \(P\)-value for a randomized test (as discussed below), is
where \(G_0\) is the identity permutation. One justification for this choice is that, if the null hypothesis is true, the original data are one of the equally likely elements of the orbit of the data–exactly as likely as the other elements generated from it. Thus the \(K+1\) values \(\{\phi(G_k(x))\}\) are equally likely if the null is true: nature provided one more random permutation, the original data. The estimate \(\hat{P}'\) is never smaller than \(1/(K+1)\). Some practitioners like \(\hat{P}'\) because it never estimates the \(P\)-value to be zero. There are other reasons for preferring it, discussed below.
The estimate \(\hat{P}'\) of \(P\) is generally biased, however: Since \(\hat{P}\) is unbiased and
so
An exact randomized test#
This section presents a different way to think about \(\hat{P}'\), producing a simple and elegant conservative test.
The key is that instead of thinking about approximating \(P\), study the behavior of \(\hat{P}'\) itself under the null \(H_0\). Recall
where \(G_0\) is the identity permutation and \(\{ G_k \}_{k=1}^K\) are elements of \(\mathcal{G}\) selected at random, uniformly.
While \(\hat{P}'\) is a biased estimate of \(P\), we shall see that it is itself a valid conditional \(P\)-value; that is, for \(p \in [0, 1]\), \(\mathbb{P} \{ \hat{P}' \le p | X \in \mathcal{G}(x) \} \le p\).
This (conditional) probability involved has two sources of randomness:
The randomness in the original data, \(X\) (although we condition on the event that \(X\) is in the orbit of \(x\) under \(\mathcal{G}\))
The randomness in selecting the random transformations \(\{ G_k \}_{k=1}^K \subset \mathcal{G}\).
The resulting hypothesis test is a randomized test: it uses auxilliary randomness (to select \(\{G_k\}\)), independent of the randomness in the data. If the experiment were repeated and the data turned out to be \(x\) again, the test will in general give a different \(P\) value: \(\hat{P}'\) is random even if \(X\) is known. But if the null hypothesis is true, the chance that \(\hat{P}'\) is less than or equal to \(p\) is itself less than or equal to \(p\).
Randomized tests have a number of desirable theoretical properties (related to continuity and convexity), but they are rarely used explicitly in practice. Tests involving simulated \(P\)-values are an example where randomized tests are used implicitly rather than explicitly–generally without recognizing that the resulting test is randomized.
By construction, \(\{G_k(x)\}_{k=1}^K\) are IID, uniformly distributed on the orbit of \(x\) under \(\mathcal{G}\). Thus, \(\{\phi(G_k(x))\}_{k=0}^K\) are IID random variables. The event \(\hat{P}' \le p\) is the event that \(\phi(x)= \phi(G_0(x))\) is larger than all but (at most) \((K+1)p\) of the values \(\{\phi(G_k(x))\}_{k=0}^K\). Under the null, \(\phi(x)\) is equally likely to be any of the \(K+1\) values. What’s the chance it would be where it is among the sorted values \(\{\phi(G_k(x))\}_{k=0}^K\)?
Let \(p' = \lfloor (K+1)p \rfloor /(K+1)\). Then \(p' \le p\) and \((K+1)p'\) is an integer. Sort the values \(\{\phi(G_k(x))\}_{k=0}^K\) from largest to smallest, breaking ties arbitrarily. Consider the \((K+1)p'\)th element of the list. If it is strictly greater than the \((K+1)p'+1\)st element of the list, then there are \((K+1)p'\) permutations \(G_k\) for which \(\phi(G_k(x))\) is strictly greater than all but \((K+1)p\) of the values \(\{\phi(G_k(x))\}_{k=0}^K\). If the \((K+1)p'\)th element of the list is equal to the \((K+1)p'+1\) element, then there are fewer than \((K+1)p'\) such permutations. Either way, the chance that a randomly selected element of the multiset \(\{\phi(G_k(x))\}_{k=0}^K\) is strictly greater than all but at most \((K+1)p'\) of the elements is
Thus \(\mathbb{P} \{\hat{P}' \le p \} \le p\). This shows that \(\hat{P}'\) is itself a conservative \(P\)-value for a randomized test (in addition to being a conservatively biased estimate of \(P\), the \(P\)-value for a related non-randomized test).
The corresponding test is defined implicitly: reject \(H_0\) at significance level \(\alpha\) if \(\hat{P}' \le \alpha\).
See also Phipson, B., and G.K. Smith, 2010. Permutation \(P\)-values Should Never Be Zero: Calculating Exact \(P\)-values When Permutations Are Randomly Drawn, Statistical Applications in Genetics and Molecular Biology, 9, https://doi.org/10.2202/1544-6115.1585