.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_data_analysis/distribution_fitting/plot_estimate_non_parametric_distribution.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_data_analysis_distribution_fitting_plot_estimate_non_parametric_distribution.py: Fit a non parametric distribution ================================= .. GENERATED FROM PYTHON SOURCE LINES 6-13 In this example we are going to estimate a non parametric distribution using the kernel smoothing method. After a short introductory example we focus on a few basic features of the API: - kernel selection - bandwidth estimation - an advanced feature such as boundary corrections .. GENERATED FROM PYTHON SOURCE LINES 15-22 .. code-block:: Python import openturns as ot import openturns.viewer as viewer from matplotlib import pylab as plt ot.Log.Show(ot.Log.NONE) .. GENERATED FROM PYTHON SOURCE LINES 23-26 An introductory example ----------------------- .. GENERATED FROM PYTHON SOURCE LINES 28-29 We create the data from a Gamma distribution : .. GENERATED FROM PYTHON SOURCE LINES 29-33 .. code-block:: Python ot.RandomGenerator.SetSeed(0) distribution = ot.Gamma(6.0, 1.0) sample = distribution.getSample(800) .. GENERATED FROM PYTHON SOURCE LINES 34-35 We define the kernel smoother and build the smoothed estimate. .. GENERATED FROM PYTHON SOURCE LINES 35-38 .. code-block:: Python kernel = ot.KernelSmoothing() estimated = kernel.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 39-40 We can draw the original distribution vs the kernel smoothing. .. GENERATED FROM PYTHON SOURCE LINES 40-49 .. code-block:: Python graph = distribution.drawPDF() graph.setTitle("Kernel smoothing vs original") kernel_plot = estimated.drawPDF().getDrawable(0) kernel_plot.setColor("blue") graph.add(kernel_plot) graph.setLegends(["original", "KS"]) graph.setLegendPosition("upper right") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_001.png :alt: Kernel smoothing vs original :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 50-51 We can obtain the bandwdth parameter : .. GENERATED FROM PYTHON SOURCE LINES 51-53 .. code-block:: Python print(kernel.getBandwidth()) .. rst-class:: sphx-glr-script-out .. code-block:: none [0.529581] .. GENERATED FROM PYTHON SOURCE LINES 54-55 We now compute a better bandwitdh with the Silverman rule. .. GENERATED FROM PYTHON SOURCE LINES 55-58 .. code-block:: Python bandwidth = kernel.computeSilvermanBandwidth(sample) print(bandwidth) .. rst-class:: sphx-glr-script-out .. code-block:: none [0.639633] .. GENERATED FROM PYTHON SOURCE LINES 59-60 The new bandwidth is used to regenerate another fitted distribution : .. GENERATED FROM PYTHON SOURCE LINES 60-62 .. code-block:: Python estimated = kernel.build(sample, bandwidth) .. GENERATED FROM PYTHON SOURCE LINES 63-72 .. code-block:: Python graph = distribution.drawPDF() graph.setTitle("Kernel smoothing vs original") kernel_plot = estimated.drawPDF().getDrawable(0) kernel_plot.setColor("blue") graph.add(kernel_plot) graph.setLegends(["original", "KS-Silverman"]) graph.setLegendPosition("upper right") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_002.png :alt: Kernel smoothing vs original :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 73-74 The Silverman rule of thumb to estimate the bandwidth provides a better estimate for the distribution. We can also study the impact of the kernel selection. .. GENERATED FROM PYTHON SOURCE LINES 76-86 Choosing a kernel ----------------- We experiment with several kernels to perform the smoothing : - the standard normal kernel - the triangular kernel - the Epanechnikov kernel - the uniform kernel .. GENERATED FROM PYTHON SOURCE LINES 88-89 We first create the data from a Gamma distribution : .. GENERATED FROM PYTHON SOURCE LINES 91-94 .. code-block:: Python distribution = ot.Gamma(6.0, 1.0) sample = distribution.getSample(800) .. GENERATED FROM PYTHON SOURCE LINES 95-96 The definition of the Normal kernel : .. GENERATED FROM PYTHON SOURCE LINES 96-99 .. code-block:: Python kernelNormal = ot.KernelSmoothing(ot.Normal()) estimatedNormal = kernelNormal.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 100-101 The definition of the Triangular kernel : .. GENERATED FROM PYTHON SOURCE LINES 101-104 .. code-block:: Python kernelTriangular = ot.KernelSmoothing(ot.Triangular()) estimatedTriangular = kernelTriangular.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 105-106 The definition of the Epanechnikov kernel : .. GENERATED FROM PYTHON SOURCE LINES 106-109 .. code-block:: Python kernelEpanechnikov = ot.KernelSmoothing(ot.Epanechnikov()) estimatedEpanechnikov = kernelEpanechnikov.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 110-111 The definition of the Uniform kernel : .. GENERATED FROM PYTHON SOURCE LINES 111-115 .. code-block:: Python kernelUniform = ot.KernelSmoothing(ot.Uniform()) estimatedUniform = kernelUniform.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 116-118 We finally compare all the distributions : .. GENERATED FROM PYTHON SOURCE LINES 118-145 .. code-block:: Python graph = distribution.drawPDF() graph.setTitle("Different kernel smoothings vs original distribution") graph.setGrid(True) kernel_estimatedNormal_plot = estimatedNormal.drawPDF().getDrawable(0) kernel_estimatedNormal_plot.setColor("blue") graph.add(kernel_estimatedNormal_plot) kernel_estimatedTriangular_plot = estimatedTriangular.drawPDF().getDrawable(0) kernel_estimatedTriangular_plot.setColor("green") graph.add(kernel_estimatedTriangular_plot) kernel_estimatedEpanechnikov_plot = estimatedEpanechnikov.drawPDF().getDrawable(0) kernel_estimatedEpanechnikov_plot.setColor("orange") graph.add(kernel_estimatedEpanechnikov_plot) kernel_estimatedUniform_plot = estimatedUniform.drawPDF().getDrawable(0) kernel_estimatedUniform_plot.setColor("black") kernel_estimatedUniform_plot.setLineStyle("dashed") graph.add(kernel_estimatedUniform_plot) graph.setLegends( ["original", "KS-Normal", "KS-Triangular", "KS-Epanechnikov", "KS-Uniform"] ) view = viewer.View(graph) .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_003.png :alt: Different kernel smoothings vs original distribution :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 146-149 We observe that all the kernels produce very similar results in practice. The Uniform kernel may be seen as the worst of them all while the Epanechnikov one is said to be a good theoritical choice. In practice the standard normal kernel is a fine choice. The most important aspect of kernel smoothing is the choice of the bandwidth. .. GENERATED FROM PYTHON SOURCE LINES 152-162 Bandwidth selection ------------------- We reproduce a classical example of the literature : the fitting of a bimodal distribution. We will show the result of a kernel smoothing with different bandwidth computation : - the Silverman rule - the Plugin bandwidth - the Mixed bandwidth .. GENERATED FROM PYTHON SOURCE LINES 164-165 We define the bimodal distribution and generate a sample out of it. .. GENERATED FROM PYTHON SOURCE LINES 165-170 .. code-block:: Python X1 = ot.Normal(10.0, 1.0) X2 = ot.Normal(-10.0, 1.0) myDist = ot.Mixture([X1, X2]) sample = myDist.getSample(2000) .. GENERATED FROM PYTHON SOURCE LINES 171-172 We now compare the fitted distribution : .. GENERATED FROM PYTHON SOURCE LINES 172-175 .. code-block:: Python graph = myDist.drawPDF() graph.setTitle("Kernel smoothing vs original") .. GENERATED FROM PYTHON SOURCE LINES 176-177 With the Silverman rule : .. GENERATED FROM PYTHON SOURCE LINES 177-183 .. code-block:: Python kernelSB = ot.KernelSmoothing() bandwidthSB = kernelSB.computeSilvermanBandwidth(sample) estimatedSB = kernelSB.build(sample, bandwidthSB) kernelSB_plot = estimatedSB.drawPDF().getDrawable(0) graph.add(kernelSB_plot) .. GENERATED FROM PYTHON SOURCE LINES 184-185 With the Plugin bandwidth : .. GENERATED FROM PYTHON SOURCE LINES 185-191 .. code-block:: Python kernelPB = ot.KernelSmoothing() bandwidthPB = kernelPB.computePluginBandwidth(sample) estimatedPB = kernelPB.build(sample, bandwidthPB) kernelPB_plot = estimatedPB.drawPDF().getDrawable(0) graph.add(kernelPB_plot) .. GENERATED FROM PYTHON SOURCE LINES 192-193 With the Mixed bandwidth : .. GENERATED FROM PYTHON SOURCE LINES 193-200 .. code-block:: Python kernelMB = ot.KernelSmoothing() bandwidthMB = kernelMB.computeMixedBandwidth(sample) estimatedMB = kernelMB.build(sample, bandwidthMB) kernelMB_plot = estimatedMB.drawPDF().getDrawable(0) kernelMB_plot.setLineStyle("dashed") graph.add(kernelMB_plot) .. GENERATED FROM PYTHON SOURCE LINES 201-206 .. code-block:: Python graph.setLegends(["original", "KS-Silverman", "KS-Plugin", "KS-Mixed"]) graph.setColors(["red", "blue", "green", "black"]) graph.setLegendPosition("upper right") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_004.png :alt: Kernel smoothing vs original :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 207-208 As expected the Silverman seriously overfit the data and the other rules are more to the point. .. GENERATED FROM PYTHON SOURCE LINES 211-216 Boundary corrections -------------------- We finish this example on an advanced feature of the kernel smoothing, the boundary corrections. .. GENERATED FROM PYTHON SOURCE LINES 218-219 We consider a Weibull distribution : .. GENERATED FROM PYTHON SOURCE LINES 219-221 .. code-block:: Python myDist = ot.WeibullMin(2.0, 1.5, 1.0) .. GENERATED FROM PYTHON SOURCE LINES 222-223 We generate a sample from the defined distribution : .. GENERATED FROM PYTHON SOURCE LINES 223-225 .. code-block:: Python sample = myDist.getSample(2000) .. GENERATED FROM PYTHON SOURCE LINES 226-227 We draw the exact Weibull distribution : .. GENERATED FROM PYTHON SOURCE LINES 227-230 .. code-block:: Python graph = myDist.drawPDF() .. GENERATED FROM PYTHON SOURCE LINES 231-236 We use two different kernels : - a standard normal kernel - the same kernel with a boundary correction .. GENERATED FROM PYTHON SOURCE LINES 238-239 The first kernel without the boundary corrections. .. GENERATED FROM PYTHON SOURCE LINES 239-242 .. code-block:: Python kernel1 = ot.KernelSmoothing() estimated1 = kernel1.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 243-244 The second kernel with the boundary corrections. .. GENERATED FROM PYTHON SOURCE LINES 244-249 .. code-block:: Python kernel2 = ot.KernelSmoothing() kernel2.setBoundaryCorrection(True) estimated2 = kernel2.build(sample) .. GENERATED FROM PYTHON SOURCE LINES 250-251 We compare the estimated PDFs : .. GENERATED FROM PYTHON SOURCE LINES 251-266 .. code-block:: Python graph.setTitle("Kernel smoothing vs original") kernel1_plot = estimated1.drawPDF().getDrawable(0) kernel1_plot.setColor("blue") graph.add(kernel1_plot) kernel2_plot = estimated2.drawPDF().getDrawable(0) kernel2_plot.setColor("green") graph.add(kernel2_plot) graph.setLegends(["original", "KS", "KS with boundary correction"]) graph.setLegendPosition("upper right") view = viewer.View(graph) .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_005.png :alt: Kernel smoothing vs original :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_non_parametric_distribution_005.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 267-268 The boundary correction made has a remarkable impact on the quality of the estimate for the small values. .. GENERATED FROM PYTHON SOURCE LINES 268-271 .. code-block:: Python plt.show() .. _sphx_glr_download_auto_data_analysis_distribution_fitting_plot_estimate_non_parametric_distribution.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_estimate_non_parametric_distribution.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_estimate_non_parametric_distribution.py `