.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_data_analysis/statistical_tests/plot_kolmogorov_distribution.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_data_analysis_statistical_tests_plot_kolmogorov_distribution.py: Kolmogorov-Smirnov : get the statistics distribution ==================================================== .. GENERATED FROM PYTHON SOURCE LINES 9-22 In this example, we draw the Kolmogorov-Smirnov (KS) distribution for a sample size 10. We want to test the hypothesis that this sample has the `Uniform(0, 1)` distribution. The K.S. distribution is first plotted in the case where the parameters of the uniform distribution are known. Then we plot the distribution when the parameters of the uniform distribution are estimated from the sample. *Reference* : Hovhannes Keutelian, "The Kolmogorov-Smirnov test when parameters are estimated from data", 30 April 1991, Fermilab Note: There is a sign error in the paper; the equation: `D[i]=max(abs(S+step),D[i])` must be replaced with `D[i]=max(abs(S-step),D[i])`. .. GENERATED FROM PYTHON SOURCE LINES 24-28 .. code-block:: Python import openturns as ot import openturns.viewer as viewer .. GENERATED FROM PYTHON SOURCE LINES 29-32 .. code-block:: Python x = [0.9374, 0.7629, 0.4771, 0.5111, 0.8701, 0.0684, 0.7375, 0.5615, 0.2835, 0.2508] sample = ot.Sample([[xi] for xi in x]) .. GENERATED FROM PYTHON SOURCE LINES 33-36 .. code-block:: Python samplesize = sample.getSize() samplesize .. rst-class:: sphx-glr-script-out .. code-block:: none 10 .. GENERATED FROM PYTHON SOURCE LINES 37-38 Plot the empirical distribution function. .. GENERATED FROM PYTHON SOURCE LINES 40-50 .. code-block:: Python graph = ot.UserDefined(sample).drawCDF() graph.setLegends(["Sample"]) curve = ot.Curve([0, 1], [0, 1]) curve.setLegend("Uniform") graph.add(curve) graph.setXTitle("X") graph.setTitle("Cumulated distribution function") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_001.svg :alt: Cumulated distribution function :srcset: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_001.svg :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 51-55 The `computeKSStatisticsIndex` function computes the Kolmogorov-Smirnov distance between the sample and the distribution. The following function is for teaching purposes only: use `FittingTest` for real applications. .. GENERATED FROM PYTHON SOURCE LINES 58-73 .. code-block:: Python def computeKSStatistics(sample, distribution): sample = sample.sort() n = sample.getSize() D = 0.0 D_previous = 0.0 for i in range(n): F = distribution.computeCDF(sample[i]) Fminus = F - float(i) / n Fplus = float(i + 1) / n - F D = max(Fminus, Fplus, D) if D > D_previous: D_previous = D return D .. GENERATED FROM PYTHON SOURCE LINES 74-77 .. code-block:: Python dist = ot.Uniform(0, 1) dist .. raw:: html
Uniform


.. GENERATED FROM PYTHON SOURCE LINES 78-81 .. code-block:: Python computeKSStatistics(sample, dist) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.17710000000000004 .. GENERATED FROM PYTHON SOURCE LINES 82-83 The following function generates a sample of K.S. distances when the tested distribution is the `Uniform(0,1)` distribution. .. GENERATED FROM PYTHON SOURCE LINES 86-99 .. code-block:: Python def generateKSSampleKnownParameters(nrepeat, samplesize): """ nrepeat : Number of repetitions, size of the table samplesize : the size of each sample to generate from the Uniform distribution """ dist = ot.Uniform(0, 1) D = ot.Sample(nrepeat, 1) for i in range(nrepeat): sample = dist.getSample(samplesize) D[i, 0] = computeKSStatistics(sample, dist) return D .. GENERATED FROM PYTHON SOURCE LINES 100-101 Generate a sample of KS distances. .. GENERATED FROM PYTHON SOURCE LINES 103-107 .. code-block:: Python nrepeat = 10000 # Size of the KS distances sample sampleD = generateKSSampleKnownParameters(nrepeat, samplesize) .. GENERATED FROM PYTHON SOURCE LINES 108-109 Compute exact Kolmogorov CDF. .. GENERATED FROM PYTHON SOURCE LINES 112-117 .. code-block:: Python def pKolmogorovPy(x): y = ot.DistFunc.pKolmogorov(samplesize, x[0]) return [y] .. GENERATED FROM PYTHON SOURCE LINES 118-121 .. code-block:: Python pKolmogorov = ot.PythonFunction(1, 1, pKolmogorovPy) .. GENERATED FROM PYTHON SOURCE LINES 122-137 .. code-block:: Python def dKolmogorov(x, samplesize): """ Compute Kolmogorov PDF for given x. x : an array, the points where the PDF must be evaluated samplesize : the size of the sample Reference Numerical Derivatives in Scilab, Michael Baudin, May 2009 """ n = x.getSize() y = ot.Sample(n, 1) for i in range(n): y[i, 0] = pKolmogorov.gradient(x[i])[0, 0] return y .. GENERATED FROM PYTHON SOURCE LINES 138-147 .. code-block:: Python def linearSample(xmin, xmax, npoints): """Returns a sample created from a regular grid from xmin to xmax with npoints points.""" step = (xmax - xmin) / (npoints - 1) rg = ot.RegularGrid(xmin, step, npoints) vertices = rg.getVertices() return vertices .. GENERATED FROM PYTHON SOURCE LINES 148-152 .. code-block:: Python n = 1000 # Number of points in the plot s = linearSample(0.001, 0.999, n) y = dKolmogorov(s, samplesize) .. GENERATED FROM PYTHON SOURCE LINES 153-163 .. code-block:: Python curve = ot.Curve(s, y) curve.setLegend("Exact distribution") graph = ot.HistogramFactory().build(sampleD).drawPDF() graph.setLegends(["Empirical distribution"]) graph.add(curve) graph.setTitle("Kolmogorov-Smirnov distribution (known parameters)") graph.setXTitle("KS-Statistics") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_002.svg :alt: Kolmogorov-Smirnov distribution (known parameters) :srcset: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_002.svg :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 164-166 Known parameters versus estimated parameters -------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 168-171 The following function generates a sample of K.S. distances when the tested distribution is the `Uniform(a,b)` distribution, where the `a` and `b` parameters are estimated from the sample. .. GENERATED FROM PYTHON SOURCE LINES 174-189 .. code-block:: Python def generateKSSampleEstimatedParameters(nrepeat, samplesize): """ nrepeat : Number of repetitions, size of the table samplesize : the size of each sample to generate from the Uniform distribution """ distfactory = ot.UniformFactory() refdist = ot.Uniform(0, 1) D = ot.Sample(nrepeat, 1) for i in range(nrepeat): sample = refdist.getSample(samplesize) trialdist = distfactory.build(sample) D[i, 0] = computeKSStatistics(sample, trialdist) return D .. GENERATED FROM PYTHON SOURCE LINES 190-191 Generate a sample of KS distances. .. GENERATED FROM PYTHON SOURCE LINES 193-195 .. code-block:: Python sampleDP = generateKSSampleEstimatedParameters(nrepeat, samplesize) .. GENERATED FROM PYTHON SOURCE LINES 196-205 .. code-block:: Python graph = ot.KernelSmoothing().build(sampleD).drawPDF() graph.setLegends(["Known parameters"]) graphP = ot.KernelSmoothing().build(sampleDP).drawPDF() graphP.setLegends(["Estimated parameters"]) graph.add(graphP) graph.setTitle("Kolmogorov-Smirnov distribution") graph.setXTitle("KS-Statistics") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_003.svg :alt: Kolmogorov-Smirnov distribution :srcset: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_003.svg :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 206-207 Display the graphs .. GENERATED FROM PYTHON SOURCE LINES 207-210 .. code-block:: Python view.ShowAll() .. GENERATED FROM PYTHON SOURCE LINES 211-215 We see that the distribution of the KS distances when the parameters are estimated is shifted towards the left: smaller distances occur more often. This is a consequence of the fact that the estimated parameters tend to make the estimated distribution closer to the empirical sample. .. _sphx_glr_download_auto_data_analysis_statistical_tests_plot_kolmogorov_distribution.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_kolmogorov_distribution.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_kolmogorov_distribution.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_kolmogorov_distribution.zip `