Test identical distributionsΒΆ

In this example we are going to estimate whether two samples follow the same distribution using the two samples Kolmogorov-Smirnov test and the graphical QQ-plot test.

The Smirnov test relies on the maximum distance between the cumulative distribution function. If F_{n_1}^{*} and F_{n_2}^{*} are the empirical cumulative density functions of both samples of size n_1 and n_2, the Smirnov test evaluates the decision variable:

D^2 = \displaystyle \sqrt{\frac{n_1n_2}{n_1+n_2}} \sup_{x}|F_{n_1}^{*}(x) - F_{n_2}^{*}(x)|

which tends towards the Kolmogorov distribution. The hypothesis of same distribution is rejected if D^2 is too high (depending on the p-value threshold).

The QQ-plot graph plots empirical quantiles levels from two samples. If both samples correspond to the same probability distribution the curve should be close to the diagonal.

import openturns as ot
import openturns.viewer as viewer

ot.Log.Show(ot.Log.NONE)

Generate 3 samples, sample1 and sample2 arise from the same distribution

distribution1 = ot.Gumbel(0.2, 0.5)
distribution2 = ot.Uniform()

ot.RandomGenerator.SetSeed(5)
sample1 = distribution1.getSample(100)
sample2 = distribution1.getSample(100)
sample3 = distribution2.getSample(100)

Visually compare sample1 and sample2 using QQ-plot

graph = ot.VisualTest.DrawQQplot(sample1, sample2)
view = viewer.View(graph)
Two sample QQ-plot

Visually compare sample1 and sample3 using QQ-plot

graph = ot.VisualTest.DrawQQplot(sample1, sample3)
view = viewer.View(graph)
Two sample QQ-plot

Numerically test sample1 against sample2

test_result = ot.HypothesisTest.TwoSamplesKolmogorov(sample1, sample2)
print(
    "Samples follow the same distribution?",
    test_result.getBinaryQualityMeasure(),
    "p-value=%.6g" % test_result.getPValue(),
    "threshold=%.6g" % test_result.getThreshold(),
)
Samples follow the same distribution? True p-value=0.190264 threshold=0.05

Numerically test sample1 against sample3

test_result = ot.HypothesisTest.TwoSamplesKolmogorov(sample1, sample3)
print(
    "Samples follow the same distribution?",
    test_result.getBinaryQualityMeasure(),
    "p-value=%.6g" % test_result.getPValue(),
    "threshold=%.6g" % test_result.getThreshold(),
)
Samples follow the same distribution? False p-value=9.86999e-15 threshold=0.05

Total running time of the script: ( 0 minutes 0.144 seconds)