Cramer-Von Mises testΒΆ

The Cramer-Von Mises test is a statistical test of whether a given sample of data is drawn from a given probability distribution which is of dimension 1 and continuous.

We denote by \left\{ x_1,\ldots, x_{\sampleSize} \right\} the data of dimension 1. Let F be the (unknown) cumulative distribution function of the continuous distribution.

We want to test whether the sample is drawn from the cumulative distribution function G.

This test involves the calculation of the test statistic which is the integrated squared distance between the empirical cumulative distribution function \widehat{F} built from the sample and G. Letting X_1, \ldots , X_\sampleSize be i.i.d. random variables following the distribution with CDF F, the test statistic is defined by:

\begin{aligned}
    D_{\sampleSize} = \int^{\infty}_{-\infty} \left[G\left(x\right) - \widehat{F}\left(x\right)\right]^2 \,
    p\left(x\right) dx
  \end{aligned}

The empirical value of the test statistic, evaluated from the sample is:

\begin{aligned}
    d_{\sampleSize} = \frac{1}{12 \sampleSize} + \sum_{i=1}^{\sampleSize}\left[\frac{2i-1}{2\sampleSize} -
    G\left(x_i\right)\right]^2
  \end{aligned}

Under the null hypothesis \mathcal{H}_0 = \{ G = F\}, the distribution of the test statistic D_{\sampleSize} is asymptotically known i.e. when \sampleSize \rightarrow +\infty. If \sampleSize is sufficiently large, we can use the asymptotic distribution to apply the test as follows. We fix a risk \alpha (error type I) and we evaluate the associated critical value d_\alpha which is the quantile of order 1-\alpha of D_{\sampleSize}.

Then a decision is made, either by comparing the test statistic to the theoretical threshold d_\alpha (or equivalently by evaluating the p-value of the sample defined as \Prob{D_{\sampleSize} > d_{\sampleSize}} and by comparing it to \alpha):

  • if d_{\sampleSize}>d_{\alpha} (or equivalently \Prob{D_{\sampleSize} > d_{\sampleSize}} < \alpha), then we reject G,

  • if d_{\sampleSize} \leq d_{\alpha} (or equivalently \Prob{D_{\sampleSize} > d_{\sampleSize}} \geq \alpha), then G is considered acceptable.