Anderson-Darling testΒΆ

The Anderson-Darling test is a statistical test of whether a given sample of data is drawn from a given probability distribution. The library only provides the Anderson-Darling test for normal distributions.

Let \left\{ x_1,\ldots,x_{\sampleSize} \right\} be a sample of dimension 1 drawn from the (unknown) cumulative distribution function F assumed to be continuous. We want to test whether the sample is drawn from a normal distribution ie whether F_X = \Phi, where \Phi is the cumulative distribution function of the normal distribution.

This test involves the calculation of the test statistic which is the distance between the empirical cumulative distribution function F_{\sampleSize} and \Phi. Letting X_1, \ldots , X_{\sampleSize} be independent random variables respectively distributed according to F, we define the the order statistics X_{(1)}, \ldots , X_{(\sampleSize)} by:

X_{(1)} \leq \dots \leq X_{(\sampleSize)}.

The test statistic is defined by:

D_{\sampleSize} = -\sampleSize-\sum^{\sampleSize}_{i=1} \frac{2i-1}{\sampleSize} \left[\log \left( \Phi(X_{(i)}) \right) + \log\left(1-\Phi(X_{(\sampleSize+1-i)})\right)\right].

This distance is a quadratic type, as in the Cramer-Von Mises test, but gives more weight to deviations of tail values. The empirical value of the test statistic denoted by d_{\sampleSize} is evaluated from the sample sorted in ascending order:

x_{(1)} \leq \dots \leq x_{(\sampleSize)}.

Under the null hypothesis \mathcal{H}_0 = \{ F = \Phi\}, the asymptotic distribution of the test statistic D_{\sampleSize} is known i.e. when \sampleSize \rightarrow +\infty. If \sampleSize is sufficiently large, we can use the asymptotic distribution to apply the test as follows. We fix a risk \alpha (error type I) and we evaluate the associated critical value d_\alpha which is the quantile of order 1-\alpha of D_{\sampleSize}.

Then a decision is made, either by comparing the test statistic to the theoretical threshold d_\alpha (or equivalently by evaluating the p-value of the sample defined as \Prob{D_{\sampleSize} > d_{\sampleSize}} and by comparing it to \alpha):

  • if d_{\sampleSize}>d_{\alpha} (or equivalently \Prob{D_{\sampleSize} > d_{\sampleSize}} < \alpha), then we reject the normal distribution,

  • if d_{\sampleSize} \leq d_{\alpha} (or equivalently \Prob{D_{\sampleSize} > d_{\sampleSize}} \geq \alpha), then the normal distribution is considered acceptable.