.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_data_analysis/distribution_fitting/plot_estimate_non_parametric_distribution.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_data_analysis_distribution_fitting_plot_estimate_non_parametric_distribution.py: Fit a non parametric distribution ================================= .. GENERATED FROM PYTHON SOURCE LINES 6-12 In this example we are going to estimate a non parametric distribution using the kernel smoothing method. After a short introductory example we focus on a few basic features of the API : - kernel selection - bandwidth estimation - an advanced feature such as boundary corrections .. GENERATED FROM PYTHON SOURCE LINES 14-21 .. code-block:: default import openturns as ot import openturns.viewer as viewer from matplotlib import pylab as plt ot.Log.Show(ot.Log.NONE) .. GENERATED FROM PYTHON SOURCE LINES 22-25 An introductory example ----------------------- .. GENERATED FROM PYTHON SOURCE LINES 27-28 We create the data from a Gamma distribution : .. GENERATED FROM PYTHON SOURCE LINES 28-32 .. code-block:: default ot.RandomGenerator.SetSeed(0) distribution = ot.Gamma(6.0, 1.0) sample = distribution.getSample(800) .. GENERATED FROM PYTHON SOURCE LINES 33-34 We define the kernel smoother and build the smoothed estimate. .. GENERATED FROM PYTHON SOURCE LINES 34-37 .. code-block:: default kernel = ot.KernelSmoothing() estimated = kernel.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 38-39 We can draw the original distribution vs the kernel smoothing. .. GENERATED FROM PYTHON SOURCE LINES 39-48 .. code-block:: default graph = distribution.drawPDF() graph.setTitle("Kernel smoothing vs original") kernel_plot = estimated.drawPDF().getDrawable(0) kernel_plot.setColor("blue") graph.add(kernel_plot) graph.setLegends(["original", "KS"]) graph.setLegendPosition("topright") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_001.png :alt: Kernel smoothing vs original :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 49-50 We can obtain the bandwdth parameter : .. GENERATED FROM PYTHON SOURCE LINES 50-52 .. code-block:: default print(kernel.getBandwidth()) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none [0.529581] .. GENERATED FROM PYTHON SOURCE LINES 53-54 We now compute a better bandwitdh with the Silverman rule. .. GENERATED FROM PYTHON SOURCE LINES 54-57 .. code-block:: default bandwidth = kernel.computeSilvermanBandwidth(sample) print(bandwidth) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none [0.639633] .. GENERATED FROM PYTHON SOURCE LINES 58-59 The new bandwidth is used to regenerate another fitted distribution : .. GENERATED FROM PYTHON SOURCE LINES 59-61 .. code-block:: default estimated = kernel.build(sample, bandwidth) .. GENERATED FROM PYTHON SOURCE LINES 62-71 .. code-block:: default graph = distribution.drawPDF() graph.setTitle("Kernel smoothing vs original") kernel_plot = estimated.drawPDF().getDrawable(0) kernel_plot.setColor("blue") graph.add(kernel_plot) graph.setLegends(["original", "KS-Silverman"]) graph.setLegendPosition("topright") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_002.png :alt: Kernel smoothing vs original :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 72-73 The Silverman rule of thumb to estimate the bandwidth provides a better estimate for the distribution. We can also study the impact of the kernel selection. .. GENERATED FROM PYTHON SOURCE LINES 75-85 Choosing a kernel ----------------- We experiment with several kernels to perform the smoothing : - the standard normal kernel - the triangular kernel - the Epanechnikov kernel - the uniform kernel .. GENERATED FROM PYTHON SOURCE LINES 87-88 We first create the data from a Gamma distribution : .. GENERATED FROM PYTHON SOURCE LINES 90-93 .. code-block:: default distribution = ot.Gamma(6.0, 1.0) sample = distribution.getSample(800) .. GENERATED FROM PYTHON SOURCE LINES 94-95 The definition of the Normal kernel : .. GENERATED FROM PYTHON SOURCE LINES 95-98 .. code-block:: default kernelNormal = ot.KernelSmoothing(ot.Normal()) estimatedNormal = kernelNormal.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 99-100 The definition of the Triangular kernel : .. GENERATED FROM PYTHON SOURCE LINES 100-103 .. code-block:: default kernelTriangular = ot.KernelSmoothing(ot.Triangular()) estimatedTriangular = kernelTriangular.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 104-105 The definition of the Epanechnikov kernel : .. GENERATED FROM PYTHON SOURCE LINES 105-108 .. code-block:: default kernelEpanechnikov = ot.KernelSmoothing(ot.Epanechnikov()) estimatedEpanechnikov = kernelEpanechnikov.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 109-110 The definition of the Uniform kernel : .. GENERATED FROM PYTHON SOURCE LINES 110-114 .. code-block:: default kernelUniform = ot.KernelSmoothing(ot.Uniform()) estimatedUniform = kernelUniform.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 115-117 We finally compare all the distributions : .. GENERATED FROM PYTHON SOURCE LINES 117-144 .. code-block:: default graph = distribution.drawPDF() graph.setTitle("Different kernel smoothings vs original distribution") graph.setGrid(True) kernel_estimatedNormal_plot = estimatedNormal.drawPDF().getDrawable(0) kernel_estimatedNormal_plot.setColor("blue") graph.add(kernel_estimatedNormal_plot) kernel_estimatedTriangular_plot = estimatedTriangular.drawPDF().getDrawable(0) kernel_estimatedTriangular_plot.setColor("green") graph.add(kernel_estimatedTriangular_plot) kernel_estimatedEpanechnikov_plot = estimatedEpanechnikov.drawPDF().getDrawable(0) kernel_estimatedEpanechnikov_plot.setColor("orange") graph.add(kernel_estimatedEpanechnikov_plot) kernel_estimatedUniform_plot = estimatedUniform.drawPDF().getDrawable(0) kernel_estimatedUniform_plot.setColor("black") kernel_estimatedUniform_plot.setLineStyle("dashed") graph.add(kernel_estimatedUniform_plot) graph.setLegends( ["original", "KS-Normal", "KS-Triangular", "KS-Epanechnikov", "KS-Uniform"] ) view = viewer.View(graph) .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_003.png :alt: Different kernel smoothings vs original distribution :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 145-147 We observe that all the kernels produce very similar results in practice. The Uniform kernel may be seen as the worst of them all while the Epanechnikov one is said to be a good theoritical choice. In practice the standard normal kernel is a fine choice. The most important aspect of kernel smoothing is the choice of the bandwidth. .. GENERATED FROM PYTHON SOURCE LINES 150-160 Bandwidth selection ------------------- We reproduce a classical example of the literature : the fitting of a bimodal distribution. We will show the result of a kernel smoothing with different bandwidth computation : - the Silverman rule - the Plugin bandwidth - the Mixed bandwidth .. GENERATED FROM PYTHON SOURCE LINES 162-163 We define the bimodal distribution and generate a sample out of it. .. GENERATED FROM PYTHON SOURCE LINES 163-168 .. code-block:: default X1 = ot.Normal(10.0, 1.0) X2 = ot.Normal(-10.0, 1.0) myDist = ot.Mixture([X1, X2]) sample = myDist.getSample(2000) .. GENERATED FROM PYTHON SOURCE LINES 169-170 We now compare the fitted distribution : .. GENERATED FROM PYTHON SOURCE LINES 170-173 .. code-block:: default graph = myDist.drawPDF() graph.setTitle("Kernel smoothing vs original") .. GENERATED FROM PYTHON SOURCE LINES 174-175 With the Silverman rule : .. GENERATED FROM PYTHON SOURCE LINES 175-181 .. code-block:: default kernelSB = ot.KernelSmoothing() bandwidthSB = kernelSB.computeSilvermanBandwidth(sample) estimatedSB = kernelSB.build(sample, bandwidthSB) kernelSB_plot = estimatedSB.drawPDF().getDrawable(0) graph.add(kernelSB_plot) .. GENERATED FROM PYTHON SOURCE LINES 182-183 With the Plugin bandwidth : .. GENERATED FROM PYTHON SOURCE LINES 183-189 .. code-block:: default kernelPB = ot.KernelSmoothing() bandwidthPB = kernelPB.computePluginBandwidth(sample) estimatedPB = kernelPB.build(sample, bandwidthPB) kernelPB_plot = estimatedPB.drawPDF().getDrawable(0) graph.add(kernelPB_plot) .. GENERATED FROM PYTHON SOURCE LINES 190-191 With the Mixed bandwidth : .. GENERATED FROM PYTHON SOURCE LINES 191-198 .. code-block:: default kernelMB = ot.KernelSmoothing() bandwidthMB = kernelMB.computeMixedBandwidth(sample) estimatedMB = kernelMB.build(sample, bandwidthMB) kernelMB_plot = estimatedMB.drawPDF().getDrawable(0) kernelMB_plot.setLineStyle("dashed") graph.add(kernelMB_plot) .. GENERATED FROM PYTHON SOURCE LINES 199-204 .. code-block:: default graph.setLegends(["original", "KS-Silverman", "KS-Plugin", "KS-Mixed"]) graph.setColors(["red", "blue", "green", "black"]) graph.setLegendPosition("topright") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_004.png :alt: Kernel smoothing vs original :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 205-206 As expected the Silverman seriously overfit the data and the other rules are more to the point. .. GENERATED FROM PYTHON SOURCE LINES 209-214 Boundary corrections -------------------- We finish this example on an advanced feature of the kernel smoothing, the boundary corrections. .. GENERATED FROM PYTHON SOURCE LINES 216-217 We consider a Weibull distribution : .. GENERATED FROM PYTHON SOURCE LINES 217-219 .. code-block:: default myDist = ot.WeibullMin(2.0, 1.5, 1.0) .. GENERATED FROM PYTHON SOURCE LINES 220-221 We generate a sample from the defined distribution : .. GENERATED FROM PYTHON SOURCE LINES 221-223 .. code-block:: default sample = myDist.getSample(2000) .. GENERATED FROM PYTHON SOURCE LINES 224-225 We draw the exact Weibull distribution : .. GENERATED FROM PYTHON SOURCE LINES 225-228 .. code-block:: default graph = myDist.drawPDF() .. GENERATED FROM PYTHON SOURCE LINES 229-234 We use two different kernels : - a standard normal kernel - the same kernel with a boundary correction .. GENERATED FROM PYTHON SOURCE LINES 236-237 The first kernel without the boundary corrections. .. GENERATED FROM PYTHON SOURCE LINES 237-240 .. code-block:: default kernel1 = ot.KernelSmoothing() estimated1 = kernel1.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 241-242 The second kernel with the boundary corrections. .. GENERATED FROM PYTHON SOURCE LINES 242-247 .. code-block:: default kernel2 = ot.KernelSmoothing() kernel2.setBoundaryCorrection(True) estimated2 = kernel2.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 248-249 We compare the estimated PDFs : .. GENERATED FROM PYTHON SOURCE LINES 249-264 .. code-block:: default graph.setTitle("Kernel smoothing vs original") kernel1_plot = estimated1.drawPDF().getDrawable(0) kernel1_plot.setColor("blue") graph.add(kernel1_plot) kernel2_plot = estimated2.drawPDF().getDrawable(0) kernel2_plot.setColor("green") graph.add(kernel2_plot) graph.setLegends(["original", "KS", "KS with boundary correction"]) graph.setLegendPosition("topright") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_005.png :alt: Kernel smoothing vs original :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_005.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 265-266 The boundary correction made has a remarkable impact on the quality of the estimate for the small values. .. GENERATED FROM PYTHON SOURCE LINES 266-269 .. code-block:: default plt.show() .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 1.057 seconds) .. _sphx_glr_download_auto_data_analysis_distribution_fitting_plot_estimate_non_parametric_distribution.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_estimate_non_parametric_distribution.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_estimate_non_parametric_distribution.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_