.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_data_analysis/statistical_tests/plot_kolmogorov_statistics.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_data_analysis_statistical_tests_plot_kolmogorov_statistics.py: Kolmogorov-Smirnov : understand the statistics ============================================== .. GENERATED FROM PYTHON SOURCE LINES 8-13 In this example, we illustrate how the Kolmogorov-Smirnov statistic is computed. * We generate a sample from a normal distribution. * We create a uniform distribution and estimate its parameters from the sample. * Compute the Kolmogorov-Smirnov statistic and plot it on top of the empirical cumulated distribution function. .. GENERATED FROM PYTHON SOURCE LINES 15-21 .. code-block:: Python import openturns as ot import openturns.viewer as viewer from matplotlib import pylab as plt ot.Log.Show(ot.Log.NONE) .. GENERATED FROM PYTHON SOURCE LINES 22-28 The `computeKSStatisticsIndex()` function computes the Kolmogorov-Smirnov distance between the sample and the distribution. Furthermore, it returns the index which achieves the maximum distance in the sorted sample. The following function is for teaching purposes only: use `FittingTest` for real applications. .. GENERATED FROM PYTHON SOURCE LINES 30-57 .. code-block:: Python def computeKSStatisticsIndex(sample, distribution): sample = ot.Sample(sample.sort()) print("Sorted") print(sample) n = sample.getSize() D = 0.0 index = -1 D_previous = 0.0 for i in range(n): F = distribution.computeCDF(sample[i]) S1 = abs(F - float(i) / n) S2 = abs(float(i + 1) / n - F) print( "i=%d, x[i]=%.4f, F(x[i])=%.4f, S1=%.4f, S2=%.4f" % (i, sample[i, 0], F, S1, S2) ) D = max(S1, S2, D) if D > D_previous: print("D max!") index = i D_previous = D observation = sample[index] return D, index, observation .. GENERATED FROM PYTHON SOURCE LINES 58-70 The `drawKSDistance()` function plots the empirical distribution function of the sample and the Kolmogorov-Smirnov distance at point x. The empirical CDF is a staircase function and is discontinuous at each observation. Denote by :math:`\hat{F}` the empirical CDF. For a given observation :math:`x` which achieves the maximum distance to the candidate distribution CDF, let us denote :math:`\hat{F}^- = \lim_{x \rightarrow x^-} \hat{F}(x)` and :math:`\hat{F}^+ = \lim_{x\rightarrow x^+} \hat{F}(x)`. The maximum distance can be achieved either by :math:`\hat{F}^-` or :math:`\hat{F}^+`. The `computeEmpiricalCDF(x)` method computes :math:`\hat{F}^+=\mathbb{P}(X \leq x)`. We compute :math:`\hat{F}^-` with the equation :math:`\hat{F}^- = \hat{F}^+ - 1/n` where :math:`n` is the sample size. .. GENERATED FROM PYTHON SOURCE LINES 73-100 .. code-block:: Python def drawKSDistance(sample, distribution, observation, D, distFactory): graph = ot.Graph("KS Distance = %.4f" % (D), "X", "CDF", True, "upper left") # Thick vertical line at point x ECDF_x_plus = sample.computeEmpiricalCDF(observation) ECDF_x_minus = ECDF_x_plus - 1.0 / sample.getSize() CDF_index = distribution.computeCDF(observation) curve = ot.Curve( [observation[0], observation[0], observation[0]], [ECDF_x_plus, ECDF_x_minus, CDF_index], ) curve.setLegend("KS Statistics") curve.setLineWidth(4.0 * curve.getLineWidth()) graph.add(curve) # Empirical CDF empiricalCDF = ot.UserDefined(sample).drawCDF() empiricalCDF.setLegends(["Empirical DF"]) graph.add(empiricalCDF) # distname = distFactory.getClassName() distribution = distFactory.build(sample) cdf = distribution.drawCDF() cdf.setLegends([distname]) graph.add(cdf) graph.setColors(ot.Drawable.BuildDefaultPalette(3)) return graph .. GENERATED FROM PYTHON SOURCE LINES 101-102 We generate a sample from a standard normal distribution. .. GENERATED FROM PYTHON SOURCE LINES 104-108 .. code-block:: Python N = ot.Normal() n = 10 sample = N.getSample(n) .. GENERATED FROM PYTHON SOURCE LINES 109-110 Compute the index which achieves the maximum Kolmogorov-Smirnov distance. .. GENERATED FROM PYTHON SOURCE LINES 112-115 We then create a uniform distribution whose parameters are estimated from the sample. This way, the K.S. distance is large enough to be graphically significant. .. GENERATED FROM PYTHON SOURCE LINES 117-121 .. code-block:: Python distFactory = ot.UniformFactory() distribution = distFactory.build(sample) distribution .. raw:: html
Uniform


.. GENERATED FROM PYTHON SOURCE LINES 122-123 Compute the index which achieves the maximum Kolmogorov-Smirnov distance. .. GENERATED FROM PYTHON SOURCE LINES 125-128 .. code-block:: Python D, index, observation = computeKSStatisticsIndex(sample, distribution) print("D=", D, ", Index=", index, ", Obs.=", observation) .. rst-class:: sphx-glr-script-out .. code-block:: none Sorted 0 : [ -1.04687 ] 1 : [ -1.01682 ] 2 : [ -0.631196 ] 3 : [ -0.579942 ] 4 : [ -0.466802 ] 5 : [ -0.0365189 ] 6 : [ 0.226093 ] 7 : [ 0.379717 ] 8 : [ 1.31265 ] 9 : [ 1.92482 ] i=0, x[i]=-1.0469, F(x[i])=0.0714, S1=0.0714, S2=0.0286 D max! i=1, x[i]=-1.0168, F(x[i])=0.0801, S1=0.0199, S2=0.1199 D max! i=2, x[i]=-0.6312, F(x[i])=0.1913, S1=0.0087, S2=0.1087 i=3, x[i]=-0.5799, F(x[i])=0.2061, S1=0.0939, S2=0.1939 D max! i=4, x[i]=-0.4668, F(x[i])=0.2387, S1=0.1613, S2=0.2613 D max! i=5, x[i]=-0.0365, F(x[i])=0.3628, S1=0.1372, S2=0.2372 i=6, x[i]=0.2261, F(x[i])=0.4386, S1=0.1614, S2=0.2614 D max! i=7, x[i]=0.3797, F(x[i])=0.4829, S1=0.2171, S2=0.3171 D max! i=8, x[i]=1.3127, F(x[i])=0.7520, S1=0.0480, S2=0.1480 i=9, x[i]=1.9248, F(x[i])=0.9286, S1=0.0286, S2=0.0714 D= 0.3170926408731421 , Index= 7 , Obs.= [0.379717] .. GENERATED FROM PYTHON SOURCE LINES 129-133 .. code-block:: Python graph = drawKSDistance(sample, distribution, observation, D, distFactory) view = viewer.View(graph) plt.show() .. image-sg:: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_statistics_001.png :alt: KS Distance = 0.3171 :srcset: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_kolmogorov_statistics_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 134-137 We see that the K.S. statistics is achieved at the observation where the distance between the empirical distribution function of the sample and the candidate distribution is largest. .. _sphx_glr_download_auto_data_analysis_statistical_tests_plot_kolmogorov_statistics.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_kolmogorov_statistics.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_kolmogorov_statistics.py `