Anderson-Darling goodness-of-fit test¶

This method deals with the modelling of a probability distribution of a random vector $\vect{X} = \left( X^1,\ldots,X^{n_X} \right)$ . It seeks to verify the compatibility between a sample of data $\left\{ \vect{x}_1,\vect{x}_2,\ldots,\vect{x}_N \right\}$ and a candidate probability distribution previous chosen. The Anderson-Darling Goodness-of-Fit test allows one to answer this question in the one dimensional case $n_X =1$ , and with a continuous distribution. The current version is limited to the case of the Normal distribution.

Let us limit the case to $n_X = 1$ . Thus we denote $\vect{X} = X^1 = X$ . This goodness-of-fit test is based on the distance between the cumulative distribution function $\widehat{F}_N$ of the sample $\left\{ x_1,x_2,\ldots,x_N \right\}$ and that of the candidate distribution, denoted $F$ . This distance is a quadratic type, as in the Cramer-Von Mises test, but gives more weight to deviations of extreme values:

$\begin{aligned} D = \int^{\infty}_{-\infty} \frac{\displaystyle \left[F\left(x\right) - \widehat{F}_N\left(x\right)\right]^2 }{\displaystyle F(x) \left( 1-F(x) \right) } \, dF(x) \end{aligned}$

With a sample $\left\{ x_1,x_2,\ldots,x_N \right\}$ , the distance is estimated by:

$\begin{aligned} \widehat{D}_N = -N-\sum^{N}_{i=1} \frac{2i-1}{N} \left[\ln F(x_{(i)})+\ln\left(1-F(x_{(N+1-i)})\right)\right] \end{aligned}$

where $\left\{x_{(1)}, \ldots, x_{(N)}\right\}$ describes the sample placed in increasing order.

The probability distribution of the distance $\widehat{D}_N$ is asymptotically known (i.e. as the size of the sample tends to infinity). If $N$ is sufficiently large, this means that for a probability $\alpha$ and a candidate distribution type, one can calculate the threshold / critical value $d_\alpha$ such that:

if $\widehat{D}_N>d_{\alpha}$ , we reject the candidate distribution with a risk of error $\alpha$ ,
if $\widehat{D}_N \leq d_{\alpha}$ , the candidate distribution is considered acceptable.

Note that $d_\alpha$ depends on the candidate distribution $F$ being tested; the current version is limited to the case of the Normal distribution.

An important notion is the so-called “ $p$ -value” of the test. This quantity is equal to the limit error probability $\alpha_\textrm{lim}$ under which the candidate distribution is rejected. Thus, the candidate distribution will be accepted if and only if $\alpha_\textrm{lim}$ is greater than the value $\alpha$ desired by the user. Note that the higher $\alpha_\textrm{lim} - \alpha$ , the more robust the decision.

OpenTURNS

An Open source initiative for the Treatment of Uncertainties, Risks'N Statistics

Previous topic

Next topic

This Page

Anderson-Darling goodness-of-fit test¶