.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_data_analysis/statistical_tests/plot_kolmogorov_distribution.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_data_analysis_statistical_tests_plot_kolmogorov_distribution.py: Kolmogorov-Smirnov : get the statistics distribution ==================================================== .. GENERATED FROM PYTHON SOURCE LINES 8-21 In this example, we draw the Kolmogorov-Smirnov distribution for a sample size 10. We want to test the hypothesis that this sample has the `Uniform(0, 1)` distribution. The K.S. distribution is first plotted in the case where the parameters of the uniform distribution are known. Then we plot the distribution when the parameters of the uniform distribution are estimated from the sample. *Reference* : Hovhannes Keutelian, "The Kolmogorov-Smirnov test when parameters are estimated from data", 30 April 1991, Fermilab Note: There is a sign error in the paper; the equation: `D[i]=max(abs(S+step),D[i])` must be replaced with `D[i]=max(abs(S-step),D[i])`. .. GENERATED FROM PYTHON SOURCE LINES 23-29 .. code-block:: Python import openturns as ot import openturns.viewer as viewer from matplotlib import pylab as plt ot.Log.Show(ot.Log.NONE) .. GENERATED FROM PYTHON SOURCE LINES 30-33 .. code-block:: Python x = [0.9374, 0.7629, 0.4771, 0.5111, 0.8701, 0.0684, 0.7375, 0.5615, 0.2835, 0.2508] sample = ot.Sample([[xi] for xi in x]) .. GENERATED FROM PYTHON SOURCE LINES 34-37 .. code-block:: Python samplesize = sample.getSize() samplesize .. rst-class:: sphx-glr-script-out .. code-block:: none 10 .. GENERATED FROM PYTHON SOURCE LINES 38-39 Plot the empirical distribution function. .. GENERATED FROM PYTHON SOURCE LINES 41-51 .. code-block:: Python graph = ot.UserDefined(sample).drawCDF() graph.setLegends(["Sample"]) curve = ot.Curve([0, 1], [0, 1]) curve.setLegend("Uniform") graph.add(curve) graph.setXTitle("X") graph.setTitle("Cumulated distribution function") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_001.png :alt: Cumulated distribution function :srcset: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 52-56 The computeKSStatisticsIndex function computes the Kolmogorov-Smirnov distance between the sample and the distribution. The following function is for teaching purposes only: use `FittingTest` for real applications. .. GENERATED FROM PYTHON SOURCE LINES 59-74 .. code-block:: Python def computeKSStatistics(sample, distribution): sample = sample.sort() n = sample.getSize() D = 0.0 D_previous = 0.0 for i in range(n): F = distribution.computeCDF(sample[i]) Fminus = F - float(i) / n Fplus = float(i + 1) / n - F D = max(Fminus, Fplus, D) if D > D_previous: D_previous = D return D .. GENERATED FROM PYTHON SOURCE LINES 75-78 .. code-block:: Python dist = ot.Uniform(0, 1) dist .. raw:: html
Uniform


.. GENERATED FROM PYTHON SOURCE LINES 79-82 .. code-block:: Python computeKSStatistics(sample, dist) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.17710000000000004 .. GENERATED FROM PYTHON SOURCE LINES 83-84 The following function generates a sample of K.S. distances when the tested distribution is the `Uniform(0,1)` distribution. .. GENERATED FROM PYTHON SOURCE LINES 87-100 .. code-block:: Python def generateKSSampleKnownParameters(nrepeat, samplesize): """ nrepeat : Number of repetitions, size of the table samplesize : the size of each sample to generate from the Uniform distribution """ dist = ot.Uniform(0, 1) D = ot.Sample(nrepeat, 1) for i in range(nrepeat): sample = dist.getSample(samplesize) D[i, 0] = computeKSStatistics(sample, dist) return D .. GENERATED FROM PYTHON SOURCE LINES 101-102 Generate a sample of KS distances. .. GENERATED FROM PYTHON SOURCE LINES 104-108 .. code-block:: Python nrepeat = 10000 # Size of the KS distances sample sampleD = generateKSSampleKnownParameters(nrepeat, samplesize) .. GENERATED FROM PYTHON SOURCE LINES 109-110 Compute exact Kolmogorov CDF. .. GENERATED FROM PYTHON SOURCE LINES 113-118 .. code-block:: Python def pKolmogorovPy(x): y = ot.DistFunc.pKolmogorov(samplesize, x[0]) return [y] .. GENERATED FROM PYTHON SOURCE LINES 119-122 .. code-block:: Python pKolmogorov = ot.PythonFunction(1, 1, pKolmogorovPy) .. GENERATED FROM PYTHON SOURCE LINES 123-138 .. code-block:: Python def dKolmogorov(x, samplesize): """ Compute Kolmogorov PDF for given x. x : an array, the points where the PDF must be evaluated samplesize : the size of the sample Reference Numerical Derivatives in Scilab, Michael Baudin, May 2009 """ n = x.getSize() y = ot.Sample(n, 1) for i in range(n): y[i, 0] = pKolmogorov.gradient(x[i])[0, 0] return y .. GENERATED FROM PYTHON SOURCE LINES 139-148 .. code-block:: Python def linearSample(xmin, xmax, npoints): """Returns a sample created from a regular grid from xmin to xmax with npoints points.""" step = (xmax - xmin) / (npoints - 1) rg = ot.RegularGrid(xmin, step, npoints) vertices = rg.getVertices() return vertices .. GENERATED FROM PYTHON SOURCE LINES 149-153 .. code-block:: Python n = 1000 # Number of points in the plot s = linearSample(0.001, 0.999, n) y = dKolmogorov(s, samplesize) .. GENERATED FROM PYTHON SOURCE LINES 154-164 .. code-block:: Python curve = ot.Curve(s, y) curve.setLegend("Exact distribution") graph = ot.HistogramFactory().build(sampleD).drawPDF() graph.setLegends(["Empirical distribution"]) graph.add(curve) graph.setTitle("Kolmogorov-Smirnov distribution (known parameters)") graph.setXTitle("KS-Statistics") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_002.png :alt: Kolmogorov-Smirnov distribution (known parameters) :srcset: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 165-167 Known parameters versus estimated parameters -------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 169-172 The following function generates a sample of K.S. distances when the tested distribution is the `Uniform(a,b)` distribution, where the `a` and `b` parameters are estimated from the sample. .. GENERATED FROM PYTHON SOURCE LINES 175-190 .. code-block:: Python def generateKSSampleEstimatedParameters(nrepeat, samplesize): """ nrepeat : Number of repetitions, size of the table samplesize : the size of each sample to generate from the Uniform distribution """ distfactory = ot.UniformFactory() refdist = ot.Uniform(0, 1) D = ot.Sample(nrepeat, 1) for i in range(nrepeat): sample = refdist.getSample(samplesize) trialdist = distfactory.build(sample) D[i, 0] = computeKSStatistics(sample, trialdist) return D .. GENERATED FROM PYTHON SOURCE LINES 191-192 Generate a sample of KS distances. .. GENERATED FROM PYTHON SOURCE LINES 194-196 .. code-block:: Python sampleDP = generateKSSampleEstimatedParameters(nrepeat, samplesize) .. GENERATED FROM PYTHON SOURCE LINES 197-208 .. code-block:: Python graph = ot.KernelSmoothing().build(sampleD).drawPDF() graph.setLegends(["Known parameters"]) graphP = ot.KernelSmoothing().build(sampleDP).drawPDF() graphP.setLegends(["Estimated parameters"]) graphP.setColors(["blue"]) graph.add(graphP) graph.setTitle("Kolmogorov-Smirnov distribution") graph.setXTitle("KS-Statistics") view = viewer.View(graph) plt.show() .. image-sg:: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_003.png :alt: Kolmogorov-Smirnov distribution :srcset: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_distribution_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 209-213 We see that the distribution of the KS distances when the parameters are estimated is shifted towards the left: smaller distances occur more often. This is a consequence of the fact that the estimated parameters tend to make the estimated distribution closer to the empirical sample. .. _sphx_glr_download_auto_data_analysis_statistical_tests_plot_kolmogorov_distribution.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_kolmogorov_distribution.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_kolmogorov_distribution.py `