Note
Click here to download the full example code
Compare samples using Komogorov-Smirnov test, QQ-plotΒΆ
In this example we are going to estimate whether two samples follow the same distribution using the two samples Kolmogorov-Smirnov test and the graphical QQ-plot test.
The Smirnov test relies on the maximum distance between the cumulative distribution function. If and are the empirical cumulative density functions of both samples of size and , the Smirnov test evaluates the decision variable:
which tends towards the Kolmogorov distribution. The hypothesis of same distribution is rejected if is too high (depending on the p-value threshold).
The QQ-plot graph plots empirical quantiles levels from two samples. If both samples correspond to the same probability distribution the curve should be close to the diagonal.
from __future__ import print_function
import openturns as ot
import openturns.viewer as viewer
from matplotlib import pylab as plt
ot.Log.Show(ot.Log.NONE)
Generate 3 samples, sample1 and sample2 arise from the same distribution
distribution1 = ot.Gumbel(0.2, 0.5)
distribution2 = ot.Uniform()
ot.RandomGenerator.SetSeed(5)
sample1 = distribution1.getSample(100)
sample2 = distribution1.getSample(100)
sample3 = distribution2.getSample(100)
Visually compare sample1 and sample2 using QQ-plot
graph = ot.VisualTest.DrawQQplot(sample1, sample2)
view = viewer.View(graph)
Visually compare sample1 and sample3 using QQ-plot
graph = ot.VisualTest.DrawQQplot(sample1, sample3)
view = viewer.View(graph)
Numerically test sample1 against sample2
test_result = ot.HypothesisTest.TwoSamplesKolmogorov(sample1, sample2)
print('Samples follow the same distribution?', test_result.getBinaryQualityMeasure(),
'p-value=%.6g' % test_result.getPValue(),
'threshold=%.6g' % test_result.getThreshold())
Out:
Samples follow the same distribution? True p-value=0.190264 threshold=0.05
Numerically test sample1 against sample3
test_result = ot.HypothesisTest.TwoSamplesKolmogorov(sample1, sample3)
print('Samples follow the same distribution?', test_result.getBinaryQualityMeasure(),
'p-value=%.6g' % test_result.getPValue(),
'threshold=%.6g' % test_result.getThreshold())
Out:
Samples follow the same distribution? False p-value=9.86999e-15 threshold=0.05
Total running time of the script: ( 0 minutes 0.164 seconds)