.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_data_analysis/statistical_tests/plot_kolmogorov_distribution.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_data_analysis_statistical_tests_plot_kolmogorov_distribution.py: Kolmogorov-Smirnov : get the statistics distribution ==================================================== .. GENERATED FROM PYTHON SOURCE LINES 8-21 In this example, we draw the Kolmogorov-Smirnov distribution for a sample size 10. We want to test the hypothesis that this sample has the `Uniform(0, 1)` distribution. The K.S. distribution is first plotted in the case where the parameters of the uniform distribution are known. Then we plot the distribution when the parameters of the uniform distribution are estimated from the sample. *Reference* : Hovhannes Keutelian, "The Kolmogorov-Smirnov test when parameters are estimated from data", 30 April 1991, Fermilab Note: There is a sign error in the paper; the equation: `D[i]=max(abs(S+step),D[i])` must be replaced with `D[i]=max(abs(S-step),D[i])`. .. GENERATED FROM PYTHON SOURCE LINES 23-29 .. code-block:: default import openturns as ot import openturns.viewer as viewer from matplotlib import pylab as plt ot.Log.Show(ot.Log.NONE) .. GENERATED FROM PYTHON SOURCE LINES 30-33 .. code-block:: default x = [0.9374, 0.7629, 0.4771, 0.5111, 0.8701, 0.0684, 0.7375, 0.5615, 0.2835, 0.2508] sample = ot.Sample([[xi] for xi in x]) .. GENERATED FROM PYTHON SOURCE LINES 34-37 .. code-block:: default samplesize = sample.getSize() samplesize .. rst-class:: sphx-glr-script-out .. code-block:: none 10 .. GENERATED FROM PYTHON SOURCE LINES 38-39 Plot the empirical distribution function. .. GENERATED FROM PYTHON SOURCE LINES 41-51 .. code-block:: default graph = ot.UserDefined(sample).drawCDF() graph.setLegends(["Sample"]) curve = ot.Curve([0, 1], [0, 1]) curve.setLegend("Uniform") graph.add(curve) graph.setXTitle("X") graph.setTitle("Cumulated distribution function") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_001.png :alt: Cumulated distribution function :srcset: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 52-56 The computeKSStatisticsIndex function computes the Kolmogorov-Smirnov distance between the sample and the distribution. The following function is for teaching purposes only: use `FittingTest` for real applications. .. GENERATED FROM PYTHON SOURCE LINES 58-73 .. code-block:: default def computeKSStatistics(sample, distribution): sample = sample.sort() n = sample.getSize() D = 0.0 D_previous = 0.0 for i in range(n): F = distribution.computeCDF(sample[i]) Fminus = F - float(i) / n Fplus = float(i + 1) / n - F D = max(Fminus, Fplus, D) if D > D_previous: D_previous = D return D .. GENERATED FROM PYTHON SOURCE LINES 74-77 .. code-block:: default dist = ot.Uniform(0, 1) dist .. raw:: html

Uniform(a = 0, b = 1)



.. GENERATED FROM PYTHON SOURCE LINES 78-81 .. code-block:: default computeKSStatistics(sample, dist) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.17710000000000004 .. GENERATED FROM PYTHON SOURCE LINES 82-83 The following function generates a sample of K.S. distances when the tested distribution is the `Uniform(0,1)` distribution. .. GENERATED FROM PYTHON SOURCE LINES 85-98 .. code-block:: default def generateKSSampleKnownParameters(nrepeat, samplesize): """ nrepeat : Number of repetitions, size of the table samplesize : the size of each sample to generate from the Uniform distribution """ dist = ot.Uniform(0, 1) D = ot.Sample(nrepeat, 1) for i in range(nrepeat): sample = dist.getSample(samplesize) D[i, 0] = computeKSStatistics(sample, dist) return D .. GENERATED FROM PYTHON SOURCE LINES 99-100 Generate a sample of KS distances. .. GENERATED FROM PYTHON SOURCE LINES 102-106 .. code-block:: default nrepeat = 10000 # Size of the KS distances sample sampleD = generateKSSampleKnownParameters(nrepeat, samplesize) .. GENERATED FROM PYTHON SOURCE LINES 107-108 Compute exact Kolmogorov CDF. .. GENERATED FROM PYTHON SOURCE LINES 110-115 .. code-block:: default def pKolmogorovPy(x): y = ot.DistFunc.pKolmogorov(samplesize, x[0]) return [y] .. GENERATED FROM PYTHON SOURCE LINES 116-119 .. code-block:: default pKolmogorov = ot.PythonFunction(1, 1, pKolmogorovPy) .. GENERATED FROM PYTHON SOURCE LINES 120-135 .. code-block:: default def dKolmogorov(x, samplesize): """ Compute Kolmogorov PDF for given x. x : an array, the points where the PDF must be evaluated samplesize : the size of the sample Reference Numerical Derivatives in Scilab, Michael Baudin, May 2009 """ n = x.getSize() y = ot.Sample(n, 1) for i in range(n): y[i, 0] = pKolmogorov.gradient(x[i])[0, 0] return y .. GENERATED FROM PYTHON SOURCE LINES 136-145 .. code-block:: default def linearSample(xmin, xmax, npoints): """Returns a sample created from a regular grid from xmin to xmax with npoints points.""" step = (xmax - xmin) / (npoints - 1) rg = ot.RegularGrid(xmin, step, npoints) vertices = rg.getVertices() return vertices .. GENERATED FROM PYTHON SOURCE LINES 146-150 .. code-block:: default n = 1000 # Number of points in the plot s = linearSample(0.001, 0.999, n) y = dKolmogorov(s, samplesize) .. GENERATED FROM PYTHON SOURCE LINES 151-161 .. code-block:: default curve = ot.Curve(s, y) curve.setLegend("Exact distribution") graph = ot.HistogramFactory().build(sampleD).drawPDF() graph.setLegends(["Empirical distribution"]) graph.add(curve) graph.setTitle("Kolmogorov-Smirnov distribution (known parameters)") graph.setXTitle("KS-Statistics") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_002.png :alt: Kolmogorov-Smirnov distribution (known parameters) :srcset: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 162-164 Known parameters versus estimated parameters -------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 166-169 The following function generates a sample of K.S. distances when the tested distribution is the `Uniform(a,b)` distribution, where the `a` and `b` parameters are estimated from the sample. .. GENERATED FROM PYTHON SOURCE LINES 171-186 .. code-block:: default def generateKSSampleEstimatedParameters(nrepeat, samplesize): """ nrepeat : Number of repetitions, size of the table samplesize : the size of each sample to generate from the Uniform distribution """ distfactory = ot.UniformFactory() refdist = ot.Uniform(0, 1) D = ot.Sample(nrepeat, 1) for i in range(nrepeat): sample = refdist.getSample(samplesize) trialdist = distfactory.build(sample) D[i, 0] = computeKSStatistics(sample, trialdist) return D .. GENERATED FROM PYTHON SOURCE LINES 187-188 Generate a sample of KS distances. .. GENERATED FROM PYTHON SOURCE LINES 190-192 .. code-block:: default sampleDP = generateKSSampleEstimatedParameters(nrepeat, samplesize) .. GENERATED FROM PYTHON SOURCE LINES 193-204 .. code-block:: default graph = ot.KernelSmoothing().build(sampleD).drawPDF() graph.setLegends(["Known parameters"]) graphP = ot.KernelSmoothing().build(sampleDP).drawPDF() graphP.setLegends(["Estimated parameters"]) graphP.setColors(["blue"]) graph.add(graphP) graph.setTitle("Kolmogorov-Smirnov distribution") graph.setXTitle("KS-Statistics") view = viewer.View(graph) plt.show() .. image-sg:: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_003.png :alt: Kolmogorov-Smirnov distribution :srcset: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 205-209 We see that the distribution of the KS distances when the parameters are estimated is shifted towards the left: smaller distances occur more often. This is a consequence of the fact that the estimated parameters tend to make the estimated distribution closer to the empirical sample. .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 1.433 seconds) .. _sphx_glr_download_auto_data_analysis_statistical_tests_plot_kolmogorov_distribution.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_kolmogorov_distribution.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_kolmogorov_distribution.ipynb `