Kolmogorov-Smirnov two samples test¶
Let be a scalar uncertain variable modeled as a random
variable. This method deals with the construction of a dataset prior to
the choice of a probability distribution for
. This statatistical
test is used to compare two samples
and
; the goal is to determine
whether these two samples come from the same probability distribution or
not. If this is the case, the two samples should be aggregated in order
to increase the robustness of further statistical analysis.
The test relies on the maximum distance between the cumulative distribution
functions and
of the samples
and
.
This distance is expressed as follows:
The probability distribution of the distance
is asymptotically known (i.e. as the size of the samples tends to
infinity). If
and
are sufficiently large, this means
that for a probability
, one can calculate the threshold /
critical value
such that:
if
, we conclude that the two samples are not identically distributed, with a risk of error
,
if
, it is reasonable to say that both samples arise from the same distribution.
An important notion is the so-called “-value” of the test. This
quantity is equal to the limit error probability
under which the “identically-distributed”
hypothesis is rejected. Thus, the two samples will be supposed
identically distributed if and only if
is
greater than the value
desired by the user. Note that the
higher
, the more robust the
decision.
This test is also referred to as the Kolmogorov-Smirnov’s test for two samples.
API:
See
HypothesisTest_TwoSamplesKolmogorov()
Examples:
References: