Cramer-Von Mises goodness-of-fit testΒΆ

This method deals with the modelling of a probability distribution of a random vector \vect{X} = \left( X^1,\ldots,X^{n_X} \right). It seeks to verify the compatibility between a sample of data \left\{ \vect{x}_1,\vect{x}_2,\ldots,\vect{x}_N \right\} and a candidate probability distribution previous chosen. The Cramer-von-Mises Goodness-of-Fit test allows to answer this question in the one dimensional case n_X=1, and with a continuous distribution. The current version is limited to the case of the Normal distribution.

Let us limit the case to n_X = 1. Thus we denote \vect{X} = X^1 = X. This goodness-of-fit test is based on the distance between the cumulative distribution function \widehat{F}_N of the sample \left\{ x_1,x_2,\ldots,x_N \right\} (see ) and that of the candidate distribution, denoted F. This distance is no longer the maximum deviation as in the Kolmogorov-Smirnov test but the distance squared and integrated over the entire variation domain of the distribution:

\begin{aligned}
    D = \int^{\infty}_{-\infty} \left[F\left(x\right) - \widehat{F}_N\left(x\right)\right]^2 \, dF
  \end{aligned}

With a sample \left\{ x_1,x_2,\ldots,x_N \right\}, the distance is estimated by:

\begin{aligned}
    \widehat{D}_N = \frac{1}{12 N} + \sum_{i=1}^{N}\left[\frac{2i-1}{2N} - F\left(x_i\right)\right]^2
  \end{aligned}

The probability distribution of the distance \widehat{D}_N is asymptotically known (i.e. as the size of the sample tends to infinity). If N is sufficiently large, this means that for a probability \alpha and a candidate distribution type, one can calculate the threshold / critical value d_\alpha such that:

  • if \widehat{D}_N>d_{\alpha}, we reject the candidate distribution with a risk of error \alpha,

  • if \widehat{D}_N \leq d_{\alpha}, the candidate distribution is considered acceptable.

Note that d_\alpha depends on the candidate distribution F being tested; it is currently is limited to the case of the Normal distribution.

An important notion is the so-called p-value of the test. This quantity is equal to the limit error probability \alpha_\textrm{lim} under which the candidate distribution is rejected. Thus, the candidate distribution will be accepted if and only if \alpha_\textrm{lim} is greater than the value \alpha desired by the user. Note that the higher \alpha_\textrm{lim} - \alpha, the more robust the decision.