.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here ` to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_auto_data_analysis_statistical_hypothesis_testing_plot_kolmogorov_distribution.py:
The Kolmogorov-Smirnov distribution
===================================
In this example, we draw the Kolmogorov-Smirnov distribution for a sample size 10. We want to test the hypothesis that this sample has the `Uniform(0,1)` distribution. The K.S. distribution is first plot in the case where the parameters of the Uniform distribution are known. Then we plot the distribution when the parameters of the Uniform distribution are estimated from the sample.
*Reference* : Hovhannes Keutelian, "The Kolmogorov-Smirnov test when parameters are estimated from data", 30 April 1991, Fermilab
There is a sign error in the paper; the equation:
```
D[i]=max(abs(S+step),D[i])
```
must be replaced with
```
D[i]=max(abs(S-step),D[i])
```
.. code-block:: default
import openturns as ot
import openturns.viewer as viewer
from matplotlib import pylab as plt
ot.Log.Show(ot.Log.NONE)
.. code-block:: default
x=[0.9374, 0.7629, 0.4771, 0.5111, 0.8701, 0.0684, 0.7375, 0.5615, 0.2835, 0.2508]
sample=ot.Sample([[xi] for xi in x])
.. code-block:: default
samplesize = sample.getSize()
samplesize
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
10
Plot the empirical distribution function.
.. code-block:: default
graph = ot.UserDefined(sample).drawCDF()
graph.setLegends(["Sample"])
curve = ot.Curve([0,1],[0,1])
curve.setLegend("Uniform")
graph.add(curve)
graph.setXTitle("X")
graph.setTitle("Cumulated distribution function")
view = viewer.View(graph)
.. image:: /auto_data_analysis/statistical_hypothesis_testing/images/sphx_glr_plot_kolmogorov_distribution_001.png
:alt: Cumulated distribution function
:class: sphx-glr-single-img
The computeKSStatisticsIndex function computes the Kolmogorov-Smirnov distance between the sample and the distribution. The following function is for teaching purposes only: use `FittingTest` for real applications.
.. code-block:: default
def computeKSStatistics(sample,distribution):
sample = sample.sort()
n = sample.getSize()
D = 0.
index = -1
D_previous = 0.
for i in range(n):
F = distribution.computeCDF(sample[i])
Fminus = F - float(i)/n
Fplus = float(i+1)/n - F
D = max(Fminus,Fplus,D)
if (D > D_previous):
index = i
D_previous = D
return D
.. code-block:: default
dist = ot.Uniform(0,1)
dist
.. raw:: html
Uniform(a = 0, b = 1)
.. code-block:: default
computeKSStatistics(sample,dist)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
0.17710000000000004
The following function generates a sample of K.S. distances when the tested distribution is the `Uniform(0,1)` distribution.
.. code-block:: default
def generateKSSampleKnownParameters(nrepeat,samplesize):
"""
nrepeat : Number of repetitions, size of the table
samplesize : the size of each sample to generate from the Uniform distribution
"""
dist = ot.Uniform(0,1)
D = ot.Sample(nrepeat,1)
for i in range(nrepeat):
sample = dist.getSample(samplesize)
D[i,0] = computeKSStatistics(sample,dist)
return D
Generate a sample of KS distances.
.. code-block:: default
nrepeat = 10000 # Size of the KS distances sample
sampleD = generateKSSampleKnownParameters(nrepeat,samplesize)
Compute exact Kolmogorov CDF.
.. code-block:: default
def pKolmogorovPy(x):
y=ot.DistFunc_pKolmogorov(samplesize,x[0])
return [y]
.. code-block:: default
pKolmogorov = ot.PythonFunction(1,1,pKolmogorovPy)
.. code-block:: default
def dKolmogorov(x,samplesize):
"""
Compute Kolmogorov PDF for given x.
x : an array, the points where the PDF must be evaluated
samplesize : the size of the sample
Reference
Numerical Derivatives in Scilab, Michael Baudin, May 2009
"""
n=x.getSize()
y=ot.Sample(n,1)
for i in range(n):
y[i,0] = pKolmogorov.gradient(x[i])[0,0]
return y
.. code-block:: default
def linearSample(xmin,xmax,npoints):
'''Returns a sample created from a regular grid
from xmin to xmax with npoints points.'''
step = (xmax-xmin)/(npoints-1)
rg = ot.RegularGrid(xmin, step, npoints)
vertices = rg.getVertices()
return vertices
.. code-block:: default
n = 1000 # Number of points in the plot
s = linearSample(0.001,0.999,n)
y = dKolmogorov(s,samplesize)
.. code-block:: default
curve = ot.Curve(s,y)
curve.setLegend("Exact distribution")
graph = ot.HistogramFactory().build(sampleD).drawPDF()
graph.setLegends(["Empirical distribution"])
graph.add(curve)
graph.setTitle("Kolmogorov-Smirnov distribution (known parameters)")
graph.setXTitle("KS-Statistics")
view = viewer.View(graph)
.. image:: /auto_data_analysis/statistical_hypothesis_testing/images/sphx_glr_plot_kolmogorov_distribution_002.png
:alt: Kolmogorov-Smirnov distribution (known parameters)
:class: sphx-glr-single-img
Known parameters versus estimated parameters
--------------------------------------------
The following function generates a sample of K.S. distances when the tested distribution is the `Uniform(a,b)` distribution, where the `a` and `b` parameters are estimated from the sample.
.. code-block:: default
def generateKSSampleEstimatedParameters(nrepeat,samplesize):
"""
nrepeat : Number of repetitions, size of the table
samplesize : the size of each sample to generate from the Uniform distribution
"""
distfactory = ot.UniformFactory()
refdist = ot.Uniform(0,1)
D = ot.Sample(nrepeat,1)
for i in range(nrepeat):
sample = refdist.getSample(samplesize)
trialdist = distfactory.build(sample)
D[i,0] = computeKSStatistics(sample,trialdist)
return D
Generate a sample of KS distances.
.. code-block:: default
sampleDP = generateKSSampleEstimatedParameters(nrepeat,samplesize)
.. code-block:: default
graph = ot.KernelSmoothing().build(sampleD).drawPDF()
graph.setLegends(["Known parameters"])
graphP = ot.KernelSmoothing().build(sampleDP).drawPDF()
graphP.setLegends(["Estimated parameters"])
graphP.setColors(["blue"])
graph.add(graphP)
graph.setTitle("Kolmogorov-Smirnov distribution")
graph.setXTitle("KS-Statistics")
view = viewer.View(graph)
plt.show()
.. image:: /auto_data_analysis/statistical_hypothesis_testing/images/sphx_glr_plot_kolmogorov_distribution_003.png
:alt: Kolmogorov-Smirnov distribution
:class: sphx-glr-single-img
We see that the distribution of the KS distances when the parameters are estimated is shifted towards the left: smaller distances occur more often. This is a consequence of the fact that the estimated parameters tend to make the estimated distribution closer to the empirical sample.
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 1.072 seconds)
.. _sphx_glr_download_auto_data_analysis_statistical_hypothesis_testing_plot_kolmogorov_distribution.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: plot_kolmogorov_distribution.py `
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: plot_kolmogorov_distribution.ipynb `
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery `_