Anderson-Darling test¶

The Anderson-Darling test is a statistical test of whether a given sample of data is drawn from a given probability distribution. The library only provides the Anderson-Darling test for normal distributions.

Let $\left\{ x_1,\ldots,x_{\sampleSize} \right\}$ be a sample of dimension 1 drawn from the (unknown) cumulative distribution function $F$ assumed to be continuous. We want to test whether the sample is drawn from a normal distribution ie whether $F_X = \Phi$ , where $\Phi$ is the cumulative distribution function of the normal distribution.

This test involves the calculation of the test statistic which is the distance between the empirical cumulative distribution function $F_{\sampleSize}$ and $\Phi$ . Letting $X_1, \ldots , X_{\sampleSize}$ be independent random variables respectively distributed according to $F$ , we define the the order statistics $X_{(1)}, \ldots , X_{(\sampleSize)}$ by:

$X_{(1)} \leq \dots \leq X_{(\sampleSize)}.$

The test statistic is defined by:

$D_{\sampleSize} = -\sampleSize-\sum^{\sampleSize}_{i=1} \frac{2i-1}{\sampleSize} \left[\log \left( \Phi(X_{(i)}) \right) + \log\left(1-\Phi(X_{(\sampleSize+1-i)})\right)\right].$

This distance is a quadratic type, as in the Cramer-Von Mises test, but gives more weight to deviations of tail values. The empirical value of the test statistic denoted by $d_{\sampleSize}$ is evaluated from the sample sorted in ascending order:

$x_{(1)} \leq \dots \leq x_{(\sampleSize)}.$

Under the null hypothesis $\mathcal{H}_0 = \{ F = \Phi\}$ , the asymptotic distribution of the test statistic $D_{\sampleSize}$ is known i.e. when $\sampleSize \rightarrow +\infty$ . If $\sampleSize$ is sufficiently large, we can use the asymptotic distribution to apply the test as follows. We fix a risk $\alpha$ (error type I) and we evaluate the associated critical value $d_\alpha$ which is the quantile of order $1-\alpha$ of $D_{\sampleSize}$ .

Then a decision is made, either by comparing the test statistic to the theoretical threshold $d_\alpha$ (or equivalently by evaluating the p-value of the sample defined as $\Prob{D_{\sampleSize} > d_{\sampleSize}}$ and by comparing it to $\alpha$ ):

if $d_{\sampleSize}>d_{\alpha}$ (or equivalently $\Prob{D_{\sampleSize} > d_{\sampleSize}} < \alpha$ ), then we reject the normal distribution,
if $d_{\sampleSize} \leq d_{\alpha}$ (or equivalently $\Prob{D_{\sampleSize} > d_{\sampleSize}} \geq \alpha$ ), then the normal distribution is considered acceptable.

OpenTURNS

An Open source initiative for the Treatment of Uncertainties, Risks'N Statistics

Previous topic

Next topic

This Page

Anderson-Darling test¶