Two-sample Kolmogorov-Smirnov testΒΆ
The Kolmogorov-Smirnov test is a statistical test that can be used to test whether two given samples of data are drawn from the same distribution which is of dimension 1 and continuous.
Let and be two samples of dimension 1 respectively drawn from the (unknown) distribution functions and .
We want to test whether both samples are drawn from the same distribution, ie whether .
This test involves the calculation of the test statistic which is the weighted maximum distance between both empirical cumulative distribution function and . Letting and be independent random variables respectively distributed according to and , both empirical cumulative distribution functions are defined by:
for all :math:` x in Rset`. The test statistic is defined by:
The empirical value of the test statistic is denoted by , using the realization of and on the samples:
Under the null hypothesis , the distribution of the test statistic is known: algorithms are available to compute the distribution of both for large (asymptotic distribution: this is the Kolmogorov distribution) or for small (exact distribution). Then we can use that distribution to apply the test as follows. We fix a risk (error type I) and we evaluate the associated critical value which is the quantile of order of .
Then a decision is made, either by comparing the test statistic to the theoretical threshold (or equivalently by evaluating the p-value of the sample defined as and by comparing it to ):
if (or equivalently ), then we reject the null hypothesis according to which both samples are drawn from the same distribution,
if (or equivalently ), then the null hypothesis is considered acceptable.