Note
Click here to download the full example code
Test identical distributionsΒΆ
In this example we are going to estimate whether two samples follow the same distribution using the two samples Kolmogorov-Smirnov test and the graphical QQ-plot test.
The Smirnov test relies on the maximum distance between the cumulative distribution function. If and are the empirical cumulative density functions of both samples of size and , the Smirnov test evaluates the decision variable:
which tends towards the Kolmogorov distribution. The hypothesis of same distribution is rejected if is too high (depending on the p-value threshold).
The QQ-plot graph plots empirical quantiles levels from two samples. If both samples correspond to the same probability distribution the curve should be close to the diagonal.
import openturns as ot
import openturns.viewer as viewer
from matplotlib import pylab as plt
ot.Log.Show(ot.Log.NONE)
Generate 3 samples, sample1 and sample2 arise from the same distribution
distribution1 = ot.Gumbel(0.2, 0.5)
distribution2 = ot.Uniform()
ot.RandomGenerator.SetSeed(5)
sample1 = distribution1.getSample(100)
sample2 = distribution1.getSample(100)
sample3 = distribution2.getSample(100)
Visually compare sample1 and sample2 using QQ-plot
graph = ot.VisualTest.DrawQQplot(sample1, sample2)
view = viewer.View(graph)
Visually compare sample1 and sample3 using QQ-plot
graph = ot.VisualTest.DrawQQplot(sample1, sample3)
view = viewer.View(graph)
Numerically test sample1 against sample2
test_result = ot.HypothesisTest.TwoSamplesKolmogorov(sample1, sample2)
print('Samples follow the same distribution?', test_result.getBinaryQualityMeasure(),
'p-value=%.6g' % test_result.getPValue(),
'threshold=%.6g' % test_result.getThreshold())
Out:
Samples follow the same distribution? True p-value=0.190264 threshold=0.05
Numerically test sample1 against sample3
test_result = ot.HypothesisTest.TwoSamplesKolmogorov(sample1, sample3)
print('Samples follow the same distribution?', test_result.getBinaryQualityMeasure(),
'p-value=%.6g' % test_result.getPValue(),
'threshold=%.6g' % test_result.getThreshold())
Out:
Samples follow the same distribution? False p-value=9.86999e-15 threshold=0.05
Total running time of the script: ( 0 minutes 0.173 seconds)