.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_data_analysis/statistical_tests/plot_kolmogorov_distribution.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_data_analysis_statistical_tests_plot_kolmogorov_distribution.py: Kolmogorov-Smirnov : get the statistics distribution ==================================================== .. GENERATED FROM PYTHON SOURCE LINES 8-20 In this example, we draw the Kolmogorov-Smirnov distribution for a sample size 10. We want to test the hypothesis that this sample has the `Uniform(0, 1)` distribution. The K.S. distribution is first plotted in the case where the  parameters of the uniform distribution are known. Then we plot the distribution when the parameters of the uniform distribution are estimated from the sample. *Reference* : Hovhannes Keutelian, "The Kolmogorov-Smirnov test when parameters are estimated from data", 30 April 1991, Fermilab Note: There is a sign error in the paper; the equation: `D[i]=max(abs(S+step),D[i])` must be replaced with `D[i]=max(abs(S-step),D[i])`. .. GENERATED FROM PYTHON SOURCE LINES 22-27 .. code-block:: default import openturns as ot import openturns.viewer as viewer from matplotlib import pylab as plt ot.Log.Show(ot.Log.NONE) .. GENERATED FROM PYTHON SOURCE LINES 28-32 .. code-block:: default x = [0.9374, 0.7629, 0.4771, 0.5111, 0.8701, 0.0684, 0.7375, 0.5615, 0.2835, 0.2508] sample = ot.Sample([[xi] for xi in x]) .. GENERATED FROM PYTHON SOURCE LINES 33-36 .. code-block:: default samplesize = sample.getSize() samplesize .. rst-class:: sphx-glr-script-out Out: .. code-block:: none 10 .. GENERATED FROM PYTHON SOURCE LINES 37-38 Plot the empirical distribution function. .. GENERATED FROM PYTHON SOURCE LINES 40-50 .. code-block:: default graph = ot.UserDefined(sample).drawCDF() graph.setLegends(["Sample"]) curve = ot.Curve([0, 1], [0, 1]) curve.setLegend("Uniform") graph.add(curve) graph.setXTitle("X") graph.setTitle("Cumulated distribution function") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_001.png :alt: Cumulated distribution function :srcset: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 51-52 The computeKSStatisticsIndex function computes the Kolmogorov-Smirnov distance between the sample and the distribution. The following function is for teaching purposes only: use `FittingTest` for real applications. .. GENERATED FROM PYTHON SOURCE LINES 54-71 .. code-block:: default def computeKSStatistics(sample, distribution): sample = sample.sort() n = sample.getSize() D = 0. index = -1 D_previous = 0. for i in range(n): F = distribution.computeCDF(sample[i]) Fminus = F - float(i)/n Fplus = float(i+1)/n - F D = max(Fminus, Fplus, D) if (D > D_previous): index = i D_previous = D return D .. GENERATED FROM PYTHON SOURCE LINES 72-75 .. code-block:: default dist = ot.Uniform(0, 1) dist .. raw:: html

Uniform(a = 0, b = 1)



.. GENERATED FROM PYTHON SOURCE LINES 76-79 .. code-block:: default computeKSStatistics(sample, dist) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none 0.17710000000000004 .. GENERATED FROM PYTHON SOURCE LINES 80-81 The following function generates a sample of K.S. distances when the tested distribution is the `Uniform(0,1)` distribution. .. GENERATED FROM PYTHON SOURCE LINES 83-96 .. code-block:: default def generateKSSampleKnownParameters(nrepeat, samplesize): """ nrepeat : Number of repetitions, size of the table samplesize : the size of each sample to generate from the Uniform distribution """ dist = ot.Uniform(0, 1) D = ot.Sample(nrepeat, 1) for i in range(nrepeat): sample = dist.getSample(samplesize) D[i, 0] = computeKSStatistics(sample, dist) return D .. GENERATED FROM PYTHON SOURCE LINES 97-98 Generate a sample of KS distances. .. GENERATED FROM PYTHON SOURCE LINES 100-104 .. code-block:: default nrepeat = 10000 # Size of the KS distances sample sampleD = generateKSSampleKnownParameters(nrepeat, samplesize) .. GENERATED FROM PYTHON SOURCE LINES 105-106 Compute exact Kolmogorov CDF. .. GENERATED FROM PYTHON SOURCE LINES 108-113 .. code-block:: default def pKolmogorovPy(x): y = ot.DistFunc.pKolmogorov(samplesize, x[0]) return [y] .. GENERATED FROM PYTHON SOURCE LINES 114-117 .. code-block:: default pKolmogorov = ot.PythonFunction(1, 1, pKolmogorovPy) .. GENERATED FROM PYTHON SOURCE LINES 118-133 .. code-block:: default def dKolmogorov(x, samplesize): """ Compute Kolmogorov PDF for given x. x : an array, the points where the PDF must be evaluated samplesize : the size of the sample Reference Numerical Derivatives in Scilab, Michael Baudin, May 2009 """ n = x.getSize() y = ot.Sample(n, 1) for i in range(n): y[i, 0] = pKolmogorov.gradient(x[i])[0, 0] return y .. GENERATED FROM PYTHON SOURCE LINES 134-143 .. code-block:: default def linearSample(xmin, xmax, npoints): '''Returns a sample created from a regular grid from xmin to xmax with npoints points.''' step = (xmax-xmin)/(npoints-1) rg = ot.RegularGrid(xmin, step, npoints) vertices = rg.getVertices() return vertices .. GENERATED FROM PYTHON SOURCE LINES 144-148 .. code-block:: default n = 1000 # Number of points in the plot s = linearSample(0.001, 0.999, n) y = dKolmogorov(s, samplesize) .. GENERATED FROM PYTHON SOURCE LINES 149-159 .. code-block:: default curve = ot.Curve(s, y) curve.setLegend("Exact distribution") graph = ot.HistogramFactory().build(sampleD).drawPDF() graph.setLegends(["Empirical distribution"]) graph.add(curve) graph.setTitle("Kolmogorov-Smirnov distribution (known parameters)") graph.setXTitle("KS-Statistics") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_002.png :alt: Kolmogorov-Smirnov distribution (known parameters) :srcset: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 160-162 Known parameters versus estimated parameters -------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 164-165 The following function generates a sample of K.S. distances when the tested distribution is the `Uniform(a,b)` distribution, where the `a` and `b` parameters are estimated from the sample. .. GENERATED FROM PYTHON SOURCE LINES 167-182 .. code-block:: default def generateKSSampleEstimatedParameters(nrepeat, samplesize): """ nrepeat : Number of repetitions, size of the table samplesize : the size of each sample to generate from the Uniform distribution """ distfactory = ot.UniformFactory() refdist = ot.Uniform(0, 1) D = ot.Sample(nrepeat, 1) for i in range(nrepeat): sample = refdist.getSample(samplesize) trialdist = distfactory.build(sample) D[i, 0] = computeKSStatistics(sample, trialdist) return D .. GENERATED FROM PYTHON SOURCE LINES 183-184 Generate a sample of KS distances. .. GENERATED FROM PYTHON SOURCE LINES 186-188 .. code-block:: default sampleDP = generateKSSampleEstimatedParameters(nrepeat, samplesize) .. GENERATED FROM PYTHON SOURCE LINES 189-200 .. code-block:: default graph = ot.KernelSmoothing().build(sampleD).drawPDF() graph.setLegends(["Known parameters"]) graphP = ot.KernelSmoothing().build(sampleDP).drawPDF() graphP.setLegends(["Estimated parameters"]) graphP.setColors(["blue"]) graph.add(graphP) graph.setTitle("Kolmogorov-Smirnov distribution") graph.setXTitle("KS-Statistics") view = viewer.View(graph) plt.show() .. image-sg:: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_003.png :alt: Kolmogorov-Smirnov distribution :srcset: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 201-202 We see that the distribution of the KS distances when the parameters are estimated is shifted towards the left: smaller distances occur more often. This is a consequence of the fact that the estimated parameters tend to make the estimated distribution closer to the empirical sample. .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 1.010 seconds) .. _sphx_glr_download_auto_data_analysis_statistical_tests_plot_kolmogorov_distribution.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_kolmogorov_distribution.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_kolmogorov_distribution.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_