The Kolmogorov-Smirnov goodness of fit test for continuous dataΒΆ

The Kolmogorov-Smirnov test is a statistical test of whether a given sample of data is drawn from a given probability distribution which is of dimension 1 and continuous.

Let \left\{ x_1,\ldots, x_{\sampleSize} \right\} be a sample of dimension 1 drawn from the (unknown) cumulative distribution function F.

We want to test whether the sample is drawn from the cumulative distribution function G.

This test involves the calculation of the test statistic which is the weighted maximum distance between the empirical cumulative distribution function F_{\sampleSize} and G. Letting X_1, \ldots , X_{\sampleSize} be independent random variables respectively distributed according to F, then F_{\sampleSize} is defined by:

F_{\sampleSize}(x) & = \sum_{i=1}^{\sampleSize} 1_{X_i \leq x}

for all x \in \Rset. The test statistic is defined by:

D_{\sampleSize} = \sqrt{\sampleSize} \sup_{x} \left|F_{\sampleSize}\left(x \right) - G\left(x \right)\right|

The empirical value of the test statistic is denoted by d, using the realization of F_{\sampleSize} on the sample:

F_{\sampleSize}(x) & = \dfrac{\mbox{number of } x_i \leq x \mbox{ in the sample}}{\sampleSize}

Under the null hypothesis \mathcal{H}_0 = \{ G = F\}, the distribution of the test statistic D_{\sampleSize} is known: algorithms are available to compute the distribution of D_{\sampleSize} both for \sampleSize large (asymptotic distribution: this is the Kolmogorov distribution) or for \sampleSize small (exact distribution). Then we can use that distribution to apply the test as follows. We fix a risk \alpha (error type I) and we evaluate the associated critical value d_\alpha which is the quantile of order 1-\alpha of D_{\sampleSize}.

Then a decision is made, either by comparing the test statistic to the theoretical threshold d_\alpha (or equivalently by evaluating the p-value of the sample defined as \Prob{D_{\sampleSize} > d_{\sampleSize}} and by comparing it to \alpha):

  • if d_{\sampleSize}>d_{\alpha} (or equivalently \Prob{D_{\sampleSize} > d_{\sampleSize}} < \alpha), then we reject G,

  • if d_{\sampleSize} \leq d_{\alpha} (or equivalently \Prob{D_{\sampleSize} > d_{\sampleSize}} \geq \alpha), then G is considered acceptable.

It is assumed that the parameters of the continuous distribution which is tested have not been inferred from the sample. If this is the case, we have to use the Lilliefors test rather than the Kolmogorov test.

The figure below illustrates the Kolmogorov-Smirnov test for an ordered sample \left\{5,6,10,22,27\right\} with respect to the Exponential distribution parameterized by \lambda = 0.07, \gamma = 0.

(Source code, png)