Two-sample Kolmogorov-Smirnov testΒΆ
The Kolmogorov-Smirnov test is a statistical test that can be used to test whether two given samples of data are drawn from the same distribution which is of dimension 1 and continuous.
Let and
be two samples of dimension 1 respectively drawn from the (unknown)
distribution functions
and
.
We want to test whether both samples are drawn from the same distribution, ie whether .
This test involves the calculation of the test statistic which is the weighted maximum
distance between both empirical cumulative distribution function
and
.
Letting
and
be independent random variables respectively distributed
according to
and
, both empirical cumulative distribution
functions are defined by:
for all :math:` x in Rset`. The test statistic is defined by:
The empirical value of the test statistic is denoted by , using the realization of
and
on the samples:
Under the null hypothesis , the distribution of the
test statistic
is
known: algorithms are available to compute the distribution of
both for
large (asymptotic distribution: this is the Kolmogorov distribution) or for
small (exact distribution). Then we can use that
distribution to apply the test as follows.
We fix a risk
(error type I) and we evaluate the associated critical value
which is the quantile of order
of
.
Then a decision is made, either by comparing the test statistic to the theoretical threshold
(or equivalently
by evaluating the p-value of the sample defined as
and by comparing
it to
):
if
(or equivalently
), then we reject the null hypothesis according to which both samples are drawn from the same distribution,
if
(or equivalently
), then the null hypothesis is considered acceptable.