.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_data_analysis/distribution_fitting/plot_estimate_non_parametric_distribution.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_data_analysis_distribution_fitting_plot_estimate_non_parametric_distribution.py: Fit a non parametric distribution ================================= .. GENERATED FROM PYTHON SOURCE LINES 7-14 In this example we are going to estimate a non parametric distribution using the kernel smoothing method. After a short introductory example we focus on a few basic features of the API: - kernel selection - bandwidth estimation - an advanced feature such as boundary corrections .. GENERATED FROM PYTHON SOURCE LINES 16-19 .. code-block:: Python import openturns as ot import openturns.viewer as viewer .. GENERATED FROM PYTHON SOURCE LINES 20-23 An introductory example ----------------------- .. GENERATED FROM PYTHON SOURCE LINES 25-26 We create the data from a :class:`~openturns.Gamma` distribution : .. GENERATED FROM PYTHON SOURCE LINES 26-29 .. code-block:: Python distribution = ot.Gamma(6.0, 1.0) sample = distribution.getSample(800) .. GENERATED FROM PYTHON SOURCE LINES 30-31 We define the kernel smoother and build the smoothed estimate. .. GENERATED FROM PYTHON SOURCE LINES 31-34 .. code-block:: Python kernel = ot.KernelSmoothing() estimated = kernel.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 35-36 We can draw the original distribution vs the kernel smoothing. .. GENERATED FROM PYTHON SOURCE LINES 36-44 .. code-block:: Python graph = distribution.drawPDF() graph.setTitle("Kernel smoothing vs original") kernel_plot = estimated.drawPDF().getDrawable(0) graph.add(kernel_plot) graph.setLegends(["original", "KS"]) graph.setLegendPosition("upper right") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_001.svg :alt: Kernel smoothing vs original :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_001.svg :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 45-46 We can obtain the bandwdth parameter : .. GENERATED FROM PYTHON SOURCE LINES 46-48 .. code-block:: Python print(kernel.getBandwidth()) .. rst-class:: sphx-glr-script-out .. code-block:: none [0.529581] .. GENERATED FROM PYTHON SOURCE LINES 49-50 We now compute a better bandwitdh with the Silverman rule. .. GENERATED FROM PYTHON SOURCE LINES 50-53 .. code-block:: Python bandwidth = kernel.computeSilvermanBandwidth(sample) print(bandwidth) .. rst-class:: sphx-glr-script-out .. code-block:: none [0.639633] .. GENERATED FROM PYTHON SOURCE LINES 54-55 The new bandwidth is used to regenerate another fitted distribution : .. GENERATED FROM PYTHON SOURCE LINES 55-57 .. code-block:: Python estimated = kernel.build(sample, bandwidth) .. GENERATED FROM PYTHON SOURCE LINES 58-66 .. code-block:: Python graph = distribution.drawPDF() graph.setTitle("Kernel smoothing vs original") kernel_plot = estimated.drawPDF().getDrawable(0) graph.add(kernel_plot) graph.setLegends(["original", "KS-Silverman"]) graph.setLegendPosition("upper right") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_002.svg :alt: Kernel smoothing vs original :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_002.svg :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 67-68 The Silverman rule of thumb to estimate the bandwidth provides a better estimate for the distribution. We can also study the impact of the kernel selection. .. GENERATED FROM PYTHON SOURCE LINES 70-80 Choosing a kernel ----------------- We experiment with several kernels to perform the smoothing : - the standard Normal kernel - the triangular kernel - the Epanechnikov kernel - the uniform kernel .. GENERATED FROM PYTHON SOURCE LINES 82-83 We first create the data from a Gamma distribution : .. GENERATED FROM PYTHON SOURCE LINES 85-88 .. code-block:: Python distribution = ot.Gamma(6.0, 1.0) sample = distribution.getSample(800) .. GENERATED FROM PYTHON SOURCE LINES 89-90 The definition of the Normal kernel : .. GENERATED FROM PYTHON SOURCE LINES 90-93 .. code-block:: Python kernelNormal = ot.KernelSmoothing(ot.Normal()) estimatedNormal = kernelNormal.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 94-95 The definition of the Triangular kernel : .. GENERATED FROM PYTHON SOURCE LINES 95-98 .. code-block:: Python kernelTriangular = ot.KernelSmoothing(ot.Triangular()) estimatedTriangular = kernelTriangular.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 99-100 The definition of the Epanechnikov kernel : .. GENERATED FROM PYTHON SOURCE LINES 100-103 .. code-block:: Python kernelEpanechnikov = ot.KernelSmoothing(ot.Epanechnikov()) estimatedEpanechnikov = kernelEpanechnikov.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 104-105 The definition of the Uniform kernel : .. GENERATED FROM PYTHON SOURCE LINES 105-109 .. code-block:: Python kernelUniform = ot.KernelSmoothing(ot.Uniform()) estimatedUniform = kernelUniform.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 110-112 We finally compare all the distributions : .. GENERATED FROM PYTHON SOURCE LINES 112-135 .. code-block:: Python graph = distribution.drawPDF() graph.setTitle("Different kernel smoothings vs original distribution") graph.setGrid(True) kernel_estimatedNormal_plot = estimatedNormal.drawPDF().getDrawable(0) graph.add(kernel_estimatedNormal_plot) kernel_estimatedTriangular_plot = estimatedTriangular.drawPDF().getDrawable(0) graph.add(kernel_estimatedTriangular_plot) kernel_estimatedEpanechnikov_plot = estimatedEpanechnikov.drawPDF().getDrawable(0) graph.add(kernel_estimatedEpanechnikov_plot) kernel_estimatedUniform_plot = estimatedUniform.drawPDF().getDrawable(0) kernel_estimatedUniform_plot.setLineStyle("dashed") graph.add(kernel_estimatedUniform_plot) graph.setLegends( ["original", "KS-Normal", "KS-Triangular", "KS-Epanechnikov", "KS-Uniform"] ) view = viewer.View(graph) .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_003.svg :alt: Different kernel smoothings vs original distribution :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_003.svg :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 136-139 We observe that all the kernels produce very similar results in practice. The Uniform kernel may be seen as the worst of them all while the Epanechnikov one is said to be a good theoritical choice. In practice the standard Normal kernel is a fine choice. The most important aspect of kernel smoothing is the choice of the bandwidth. .. GENERATED FROM PYTHON SOURCE LINES 142-152 Bandwidth selection ------------------- We reproduce a classical example of the literature : the fitting of a bimodal distribution. We will show the result of a kernel smoothing with different bandwidth computation : - the Silverman rule - the Plugin bandwidth - the Mixed bandwidth .. GENERATED FROM PYTHON SOURCE LINES 154-155 We define the bimodal distribution and generate a sample out of it. .. GENERATED FROM PYTHON SOURCE LINES 155-160 .. code-block:: Python X1 = ot.Normal(10.0, 1.0) X2 = ot.Normal(-10.0, 1.0) myDist = ot.Mixture([X1, X2]) sample = myDist.getSample(2000) .. GENERATED FROM PYTHON SOURCE LINES 161-162 We now compare the fitted distribution : .. GENERATED FROM PYTHON SOURCE LINES 162-165 .. code-block:: Python graph = myDist.drawPDF() graph.setTitle("Kernel smoothing vs original") .. GENERATED FROM PYTHON SOURCE LINES 166-167 With the Silverman rule : .. GENERATED FROM PYTHON SOURCE LINES 167-173 .. code-block:: Python kernelSB = ot.KernelSmoothing() bandwidthSB = kernelSB.computeSilvermanBandwidth(sample) estimatedSB = kernelSB.build(sample, bandwidthSB) kernelSB_plot = estimatedSB.drawPDF().getDrawable(0) graph.add(kernelSB_plot) .. GENERATED FROM PYTHON SOURCE LINES 174-175 With the Plugin bandwidth : .. GENERATED FROM PYTHON SOURCE LINES 175-181 .. code-block:: Python kernelPB = ot.KernelSmoothing() bandwidthPB = kernelPB.computePluginBandwidth(sample) estimatedPB = kernelPB.build(sample, bandwidthPB) kernelPB_plot = estimatedPB.drawPDF().getDrawable(0) graph.add(kernelPB_plot) .. GENERATED FROM PYTHON SOURCE LINES 182-183 With the Mixed bandwidth : .. GENERATED FROM PYTHON SOURCE LINES 183-190 .. code-block:: Python kernelMB = ot.KernelSmoothing() bandwidthMB = kernelMB.computeMixedBandwidth(sample) estimatedMB = kernelMB.build(sample, bandwidthMB) kernelMB_plot = estimatedMB.drawPDF().getDrawable(0) kernelMB_plot.setLineStyle("dashed") graph.add(kernelMB_plot) .. GENERATED FROM PYTHON SOURCE LINES 191-195 .. code-block:: Python graph.setLegends(["original", "KS-Silverman", "KS-Plugin", "KS-Mixed"]) graph.setLegendPosition("upper right") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_004.svg :alt: Kernel smoothing vs original :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_004.svg :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 196-197 As expected the Silverman seriously overfit the data and the other rules are more to the point. .. GENERATED FROM PYTHON SOURCE LINES 200-205 Boundary corrections -------------------- We detail here an advanced feature of the kernel smoothing, the boundary corrections. .. GENERATED FROM PYTHON SOURCE LINES 207-208 We consider a Weibull distribution : .. GENERATED FROM PYTHON SOURCE LINES 208-210 .. code-block:: Python myDist = ot.WeibullMin(2.0, 1.5, 1.0) .. GENERATED FROM PYTHON SOURCE LINES 211-212 We generate a sample from the defined distribution : .. GENERATED FROM PYTHON SOURCE LINES 212-214 .. code-block:: Python sample = myDist.getSample(2000) .. GENERATED FROM PYTHON SOURCE LINES 215-216 We draw the exact Weibull distribution : .. GENERATED FROM PYTHON SOURCE LINES 216-219 .. code-block:: Python graph = myDist.drawPDF() .. GENERATED FROM PYTHON SOURCE LINES 220-225 We use two different kernels : - a standard Normal kernel - the same kernel with a boundary correction .. GENERATED FROM PYTHON SOURCE LINES 227-228 The first kernel without the boundary corrections. .. GENERATED FROM PYTHON SOURCE LINES 228-231 .. code-block:: Python kernel1 = ot.KernelSmoothing() estimated1 = kernel1.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 232-233 The second kernel with the boundary corrections. .. GENERATED FROM PYTHON SOURCE LINES 233-238 .. code-block:: Python kernel2 = ot.KernelSmoothing() kernel2.setBoundaryCorrection(True) estimated2 = kernel2.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 239-240 We compare the estimated PDFs : .. GENERATED FROM PYTHON SOURCE LINES 240-253 .. code-block:: Python graph.setTitle("Kernel smoothing vs original") kernel1_plot = estimated1.drawPDF().getDrawable(0) graph.add(kernel1_plot) kernel2_plot = estimated2.drawPDF().getDrawable(0) graph.add(kernel2_plot) graph.setLegends(["original", "KS", "KS with boundary correction"]) graph.setLegendPosition("upper right") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_005.svg :alt: Kernel smoothing vs original :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_005.svg :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 254-255 The boundary correction made has a remarkable impact on the quality of the estimate for the small values. .. GENERATED FROM PYTHON SOURCE LINES 257-263 Log-transform treatment ----------------------- We finish this example on another advanced feature of the kernel smoothing: the log-transform treatment. This treatment is highly suited to skewed distributions, which are all challenging for kernel smoothing. .. GENERATED FROM PYTHON SOURCE LINES 265-266 We consider several distributions which have significant skewness: .. GENERATED FROM PYTHON SOURCE LINES 266-277 .. code-block:: Python distCollection = [ ot.LogNormal(0.0, 2.5), ot.Beta(20000.5, 2.5, 0.0, 1.0), ot.Exponential(), ot.WeibullMax(1.0, 0.9, 0.0), ot.Mixture([ot.Normal(-1.0, 0.5), ot.Normal(1.0, 1.0)], [0.4, 0.6]), ot.Mixture( [ot.LogNormal(-1.0, 1.0, -1.0), ot.LogNormal(1.0, 1.0, 1.0)], [0.2, 0.8] ), ] .. GENERATED FROM PYTHON SOURCE LINES 278-288 For each distribution, we do the following steps: - we generate a sample of size 5000, - we fit a kernel smoothing distribution without the log-transform treatment, - we fit a kernel smoothing distribution with the log-transform treatment, - we plot the real distribution and both non parametric estimations. Other transformations could be used, but the Log-transform one is quite effective. If the skewness is moderate, there is almost no change wrt simple kernel smoothing. But if the skewness is large, the transformation performs very well. Note that, in addition, this transformation performs an automatic boundary correction. .. GENERATED FROM PYTHON SOURCE LINES 288-318 .. code-block:: Python grid = ot.GridLayout(2, 3) ot.RandomGenerator.SetSeed(0) for i, distribution in enumerate(distCollection): sample = distribution.getSample(5000) # We draw the real distribution graph = distribution.drawPDF() graph.setLegends([distribution.getClassName()]) # We choose the default kernel kernel = ot.KernelSmoothing() # We activate no particular treatment fitted = kernel.build(sample) curve = fitted.drawPDF() curve.setLegends(["Fitted"]) graph.add(curve) # We activate the log-transform treatment kernel.setUseLogTransform(True) fitted = kernel.build(sample) curve = fitted.drawPDF() curve.setLegends(["Fitted LogTransform"]) curve = curve.getDrawable(0) curve.setLineStyle("dashed") graph.add(curve) grid.setGraph(i // 3, i % 3, graph) view = viewer.View(grid) .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_006.svg :alt: plot estimate non parametric distribution :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_006.svg :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 319-320 .. code-block:: Python viewer.View.ShowAll() .. _sphx_glr_download_auto_data_analysis_distribution_fitting_plot_estimate_non_parametric_distribution.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_estimate_non_parametric_distribution.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_estimate_non_parametric_distribution.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_estimate_non_parametric_distribution.zip `