Select fitted distributions
===========================

In this example help to make a choice between several distributions fitted to a sample.

Several methods can be used:

- the ranking by the Kolmogorov p-values (for continuous distributions),
- the ranking by the ChiSquared p-values (for discrete distributions),
- the ranking by BIC values.

.. code-block:: default

   import openturns as ot
   import openturns.viewer as viewer
   from matplotlib import pylab as plt
   ot.Log.Show(ot.Log.NONE)

Create a sample from a continuous distribution

.. code-block:: default

   distribution = ot.Beta(2.0, 2.0, 0.0, 1.0)
   sample = distribution.getSample(1000)
   graph = ot.UserDefined(sample).drawCDF()
   view = viewer.View(graph)

.. image-sg:: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_fitted_distribution_ranking_001.png
   :alt: X0 CDF
   :srcset: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_fitted_distribution_ranking_001.png
   :class: sphx-glr-single-img

**1. Specify the model only**

Create the list of distribution estimators

.. code-block:: default

   factories = [ot.BetaFactory(), ot.TriangularFactory()]

Rank the continuous models by the Lilliefors p-values:

.. code-block:: default

   estimated_distribution, test_result = ot.FittingTest.BestModelLilliefors(
       sample, factories
   )
   test_result

.. raw:: html

class=TestResult name=Unnamed type=Lilliefors Beta binaryQualityMeasure=false p-value threshold=0.5 p-value=0.006 statistic=0.0327766 description=[Beta(alpha = 1.72649, beta = 1.66568, a = 0.00526109, b = 0.970313) vs sample Beta]

.. GENERATED FROM PYTHON SOURCE LINES 44-45 Rank the continuous models wrt the BIC criteria (no test result): .. GENERATED FROM PYTHON SOURCE LINES 45-47 .. code-block:: default ot.FittingTest.BestModelBIC(sample, factories) .. rst-class:: sphx-glr-script-out .. code-block:: none [class=Beta name=Beta dimension=1 alpha=1.72649 beta=1.66568 a=0.00526109 b=0.970313, -0.19254944819710879] .. GENERATED FROM PYTHON SOURCE LINES 48-49 Rank the continuous models wrt the AIC criteria (no test result) .. GENERATED FROM PYTHON SOURCE LINES 49-51 .. code-block:: default ot.FittingTest.BestModelAIC(sample, factories) .. rst-class:: sphx-glr-script-out .. code-block:: none [class=Beta name=Beta dimension=1 alpha=1.72649 beta=1.66568 a=0.00526109 b=0.970313, -0.21218046931303733] .. GENERATED FROM PYTHON SOURCE LINES 52-53 Rank the continuous models wrt the AICc criteria (no test result): .. GENERATED FROM PYTHON SOURCE LINES 53-55 .. code-block:: default ot.FittingTest.BestModelAICC(sample, factories) .. rst-class:: sphx-glr-script-out .. code-block:: none [class=Beta name=Beta dimension=1 alpha=1.72649 beta=1.66568 a=0.00526109 b=0.970313, -0.2121402683080122] .. GENERATED FROM PYTHON SOURCE LINES 56-57 **2. Specify the model and its parameters** .. GENERATED FROM PYTHON SOURCE LINES 59-60 Create a collection of the distributions to be tested .. GENERATED FROM PYTHON SOURCE LINES 60-62 .. code-block:: default distributions = [ot.Beta(2.0, 2.0, 0.0, 1.0), ot.Triangular(0.0, 0.5, 1.0)] .. GENERATED FROM PYTHON SOURCE LINES 63-64 Rank the continuous models by the Kolmogorov p-values: .. GENERATED FROM PYTHON SOURCE LINES 64-69 .. code-block:: default estimated_distribution, test_result = ot.FittingTest.BestModelKolmogorov( sample, distributions ) test_result .. raw:: html

class=TestResult name=Unnamed type=Kolmogorov Beta binaryQualityMeasure=true p-value threshold=0.05 p-value=0.127302 statistic=0.0369407 description=[Beta(alpha = 2, beta = 2, a = 0, b = 1) vs sample Beta]

.. GENERATED FROM PYTHON SOURCE LINES 70-71 Rank the continuous models wrt the BIC criteria: .. GENERATED FROM PYTHON SOURCE LINES 71-73 .. code-block:: default ot.FittingTest.BestModelBIC(sample, distributions) .. rst-class:: sphx-glr-script-out .. code-block:: none [class=Beta name=Beta dimension=1 alpha=2 beta=2 a=0 b=1, -0.21804827501286062] .. GENERATED FROM PYTHON SOURCE LINES 74-75 Rank the continuous models wrt the AIC criteria: .. GENERATED FROM PYTHON SOURCE LINES 75-77 .. code-block:: default ot.FittingTest.BestModelAIC(sample, distributions) .. rst-class:: sphx-glr-script-out .. code-block:: none [class=Beta name=Beta dimension=1 alpha=2 beta=2 a=0 b=1, -0.21804827501286062] .. GENERATED FROM PYTHON SOURCE LINES 78-79 Rank the continuous models wrt the AICc criteria: .. GENERATED FROM PYTHON SOURCE LINES 79-81 .. code-block:: default ot.FittingTest.BestModelAICC(sample, distributions) .. rst-class:: sphx-glr-script-out .. code-block:: none [class=Beta name=Beta dimension=1 alpha=2 beta=2 a=0 b=1, -0.21804827501286062] .. GENERATED FROM PYTHON SOURCE LINES 82-83 **Discrete distributions** .. GENERATED FROM PYTHON SOURCE LINES 85-86 Create a sample from a discrete distribution .. GENERATED FROM PYTHON SOURCE LINES 86-91 .. code-block:: default distribution = ot.Poisson(2.0) sample = distribution.getSample(1000) graph = ot.UserDefined(sample).drawCDF() view = viewer.View(graph) .. image-sg:: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_fitted_distribution_ranking_002.png :alt: X0 CDF :srcset: /auto_data_analysis/statistical_tests/images/sphx_glr_plot_fitted_distribution_ranking_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 92-93 Create the list of distribution estimators .. GENERATED FROM PYTHON SOURCE LINES 93-95 .. code-block:: default distributions = [ot.Poisson(2.0), ot.Geometric(0.1)] .. GENERATED FROM PYTHON SOURCE LINES 96-97 Rank the discrete models wrt the ChiSquared p-values: .. GENERATED FROM PYTHON SOURCE LINES 97-102 .. code-block:: default estimated_distribution, test_result = ot.FittingTest.BestModelChiSquared( sample, distributions ) test_result .. raw:: html

class=TestResult name=Unnamed type=ChiSquared Poisson binaryQualityMeasure=true p-value threshold=0.05 p-value=0.184085 statistic=8.81784 description=[Poisson(lambda = 2) vs sample Poisson]