Two-sample Kolmogorov-Smirnov testΒΆ
The Kolmogorov-Smirnov test is a statistical test that can be used to test whether two given samples of data are drawn from the same distribution which is of dimension 1 and continuous.
Let and
be two samples of dimension 1 respectively drawn from the (unknown)
distribution functions
and
.
We want to test whether both samples are drawn from the same distribution, ie whether .
This test involves the calculation of the test statistic which is the weighted maximum
distance between both empirical cumulative distribution function
and
.
Letting
and
be independent random variables respectively distributed
according to
and
, both empirical cumulative distribution
functions are defined by:
for all :math:` x in Rset`. The test statistic is defined by:
The empirical value of the test statistic is denoted by , using the realization of
and
on the samples:
Under the null hypothesis , the distribution of the
test statistic
is
known: algorithms are available to compute the distribution of
both for
large (asymptotic distribution: this is the Kolmogorov distribution) or for
small (exact distribution). Then we can use that
distribution to apply the test as follows.
We fix a risk
(error type I) and we evaluate the associated critical value
which is the quantile of order
of
.
Then a decision is made, either by comparing the test statistic to the theoretical threshold
(or equivalently
by evaluating the p-value of the sample defined as
and by comparing
it to
):
if
(or equivalently
), then we reject the null hypothesis according to which both samples are drawn from the same distribution,
if
(or equivalently
), then the null hypothesis is considered acceptable.
OpenTURNS