.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_data_analysis/distribution_fitting/plot_advanced_mle_estimator.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_data_analysis_distribution_fitting_plot_advanced_mle_estimator.py: Fitting a distribution with customized maximum likelihood ========================================================= .. GENERATED FROM PYTHON SOURCE LINES 7-26 Introduction ------------ When we perform distribution fitting using the :class:`~openturns.MaximumLikelihoodFactory` class, the default optimization solver sometimes fails to maximize the likelihood. This might be because the optimization solver is not appropriate in this particular case, or because the bounds of the problem are not properly set. In this example, we configure the optimization solver by two different methods. - The first method sets a specific solver among the :class:`~openturns.NLopt` solvers. Moreover, we customize the distribution parameters bounds. This helps the solver to avoid the part of the domain where the distribution cannot be built. - The second method sets the :class:`~openturns.ResourceMap` key that is used by the :class:`~openturns.MaximumLikelihoodFactory` in order to select the default solver. .. GENERATED FROM PYTHON SOURCE LINES 26-29 .. code-block:: Python import openturns as ot .. GENERATED FROM PYTHON SOURCE LINES 30-32 Simulating a sample ------------------- .. GENERATED FROM PYTHON SOURCE LINES 34-36 In this example, we consider the :class:`~openturns.TruncatedNormal` distribution with default parameters and generate a large sample from it. .. GENERATED FROM PYTHON SOURCE LINES 36-41 .. code-block:: Python distribution = ot.TruncatedNormal() print("Distribution=", distribution) size = 10000 sample = distribution.getSample(size) .. rst-class:: sphx-glr-script-out .. code-block:: none Distribution= TruncatedNormal(mu = 0, sigma = 1, a = -1, b = 1) .. GENERATED FROM PYTHON SOURCE LINES 42-43 We can use the :class:`~openturns.TruncatedNormalFactory` class. .. GENERATED FROM PYTHON SOURCE LINES 43-47 .. code-block:: Python factory = ot.TruncatedNormalFactory() fittedDistribution = factory.build(sample) print("Fitted distribution=", fittedDistribution) .. rst-class:: sphx-glr-script-out .. code-block:: none Fitted distribution= TruncatedNormal(mu = -0.000687592, sigma = 1.07864, a = -0.999958, b = 1.00006) .. GENERATED FROM PYTHON SOURCE LINES 48-53 But this method sometimes fails for specific distributions, for example when the sample size is very small or when the distribution has specific properties (e.g., some parameter is known beforehand). In these cases, general-purpose methods cannot be used, and the methods presented below may be relevant. .. GENERATED FROM PYTHON SOURCE LINES 55-57 Defining the estimation problem ------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 59-73 The truncated Normal distribution has 4 parameters: - the mean and standard deviation parameters of the underlying Normal distribution, - the minimum and maximum of the underlying truncated distribution. Estimating the parameters will be easier if two conditions are met: - the starting point (i.e., the initial values of the parameters) of the optimization is compatible with the sample, - the bounds are compatible with the distribution. First, we define the starting point of the optimization, using statistics from the sample. .. GENERATED FROM PYTHON SOURCE LINES 75-82 .. code-block:: Python sample_min = sample.getMin()[0] sample_max = sample.getMax()[0] sample_mean = sample.computeMean()[0] sigma_hat = sample.computeStandardDeviation()[0] startingPoint = [sample_mean, sigma_hat, sample_min, sample_max] print("startingPoint=", startingPoint) .. rst-class:: sphx-glr-script-out .. code-block:: none startingPoint= [-0.00013651384062640784, 0.5448019704116038, -0.9998576357393187, 0.9999620247714397] .. GENERATED FROM PYTHON SOURCE LINES 83-108 Secondly, we set the bounds of the parameters. We assume that the minimum and maximum cannot be lower than -1 or greater than 1. The parameters have the following bounds: - :math:`\mu \in [-\infty, +\infty]` ; - :math:`\sigma \in [0, +\infty]` ; - :math:`a \in [-1, 1]` ; - :math:`b \in [-1, 1]`. But this may not give satisfactory results, because infinite parameters are impossible and a zero standard deviation is infeasible. Furthermore, we must have :math:`a < b`. We assume that we know that the mean parameter cannot be greater than 10 and lower than -10. Similarly, the standard deviation cannot be greater than 10. Hence, we use the following bounds: - :math:`\mu \in [-10, +10]` ; - :math:`\sigma \in [\epsilon, +10]` ; - :math:`a \in [-1 - \epsilon, \bar{x} - \epsilon]` ; - :math:`b \in [\bar{x} + \epsilon, 1 + \epsilon]` where :math:`\bar{x}` is the sample mean and :math:`\epsilon` is a small parameter. .. GENERATED FROM PYTHON SOURCE LINES 110-116 .. code-block:: Python epsilon = 1.0e-3 bounds_lower = [-10.0, epsilon, -1.0 - epsilon, sample_mean + epsilon] bounds_upper = [10.0, 10.0, sample_mean - epsilon, 1.0 + epsilon] print("bounds_lower=", bounds_lower) print("bounds_upper=", bounds_upper) .. rst-class:: sphx-glr-script-out .. code-block:: none bounds_lower= [-10.0, 0.001, -1.001, 0.0008634861593735921] bounds_upper= [10.0, 10.0, -0.001136513840626408, 1.001] .. GENERATED FROM PYTHON SOURCE LINES 117-118 The boolean `False` means unbounded and `True` means bounded. .. GENERATED FROM PYTHON SOURCE LINES 120-124 .. code-block:: Python finiteLowerBound = [True] * 4 finiteUpperBound = [True] * 4 interval = ot.Interval(bounds_lower, bounds_upper, finiteLowerBound, finiteUpperBound) .. GENERATED FROM PYTHON SOURCE LINES 125-127 Setting the solver and bounds ----------------------------- .. GENERATED FROM PYTHON SOURCE LINES 129-136 The two methods that we suggest are based on the :class:`~openturns.MaximumLikelihoodFactory` class which uses the maximum likelihood estimator to estimate the parameters of the distribution depending on the sample. The first method is to use the :meth:`~openturns.MaximumLikelihoodFactory.setOptimizationAlgorithm` method. This sets the optimization algorithm as a replacement for the default optimization solver. .. GENERATED FROM PYTHON SOURCE LINES 136-148 .. code-block:: Python def printTruncatedNormalParameters(truncatedNormalDistribution): mu, sigma, a, b = truncatedNormalDistribution.getParameter() print(" ", truncatedNormalDistribution.getImplementation().getClassName()) print(" mu= %.3f" % (mu)) print(" sigma= %.4f" % (sigma)) print(" a= %.2f" % (a)) print(" b= %.2f" % (b)) return None .. GENERATED FROM PYTHON SOURCE LINES 149-169 .. code-block:: Python factory = ot.MaximumLikelihoodFactory(distribution) factory.setOptimizationBounds(interval) distribution_MLE = factory.build(sample) print("Fitted distribution with default solver:") printTruncatedNormalParameters(distribution_MLE) default_solver = factory.getOptimizationAlgorithm() print("Default solver=", default_solver.getImplementation().getClassName()) solver = ot.NLopt("LN_COBYLA") print("New solver=", solver.getClassName(), solver.getAlgorithmName()) solver.setStartingPoint(startingPoint) factory.setOptimizationAlgorithm(solver) print( "Updated solver=", factory.getOptimizationAlgorithm().getImplementation().getClassName(), ) distribution_MLE = factory.build(sample) print("Fitted distribution with new solver:") printTruncatedNormalParameters(distribution_MLE) .. rst-class:: sphx-glr-script-out .. code-block:: none Fitted distribution with default solver: TruncatedNormal mu= -0.000 sigma= 1.0001 a= -1.00 b= 1.00 Default solver= TNC New solver= NLopt LN_COBYLA Updated solver= NLopt Fitted distribution with new solver: TruncatedNormal mu= -0.000 sigma= 0.5448 a= -1.00 b= 1.00 .. GENERATED FROM PYTHON SOURCE LINES 170-172 Using ResourceMap to set the solver ----------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 174-177 Another method is to use a special key of the :class:`~openturns.ResourceMap`, which defines the default solver used by :class:`~openturns.MaximumLikelihoodFactory`. In this case, we do not define the starting point. .. GENERATED FROM PYTHON SOURCE LINES 179-195 .. code-block:: Python ot.ResourceMap.SetAsString( "MaximumLikelihoodFactory-DefaultOptimizationAlgorithm", "LN_COBYLA" ) ot.ResourceMap.SetAsUnsignedInteger( "MaximumLikelihoodFactory-MaximumCallsNumber", 100000 ) factory = ot.MaximumLikelihoodFactory(distribution) factory.setOptimizationBounds(interval) print( "Default solver=", factory.getOptimizationAlgorithm().getImplementation().getClassName(), ) distribution_MLE = factory.build(sample) print("Fitted distribution:") printTruncatedNormalParameters(distribution_MLE) .. rst-class:: sphx-glr-script-out .. code-block:: none Default solver= NLopt Fitted distribution: TruncatedNormal mu= -0.001 sigma= 1.0778 a= -1.00 b= 1.00 .. GENERATED FROM PYTHON SOURCE LINES 196-198 Fitting a LogNormal with zero location parameter ------------------------------------------------ .. GENERATED FROM PYTHON SOURCE LINES 200-207 We now consider the :class:`~openturns.LogNormal` distribution and present two different topics. - We show how to estimate the parameters of a LogNormal distribution which has a zero location parameter. - We compute the list of solvers which are compatible with a specific MLE problem. .. GENERATED FROM PYTHON SOURCE LINES 209-210 We generate a large sample from the standard LogNormal distribution. .. GENERATED FROM PYTHON SOURCE LINES 212-215 .. code-block:: Python standardLogNormal = ot.LogNormal() sample = standardLogNormal.getSample(1000) .. GENERATED FROM PYTHON SOURCE LINES 216-228 If we estimate the parameters using :class:`~openturns.LogNormalFactory`, the algorithm does not know that the true value of the location parameter :math:`\gamma` is zero. Hence, the estimator may not necessarily lead to an estimated parameter exactly equal to zero. In this example, we assume that we know that this parameter is zero. In this case, the simplest method is to use the :meth:`~openturns.MaximumLikelihoodFactory.setKnownParameter` method. Furthermore, we know that the absolute value of the mean of the underlying Normal distribution cannot be greater than 5. Finally, we know that the standard deviation of the underlying Normal distribution cannot be lower than, say, :math:`\epsilon = 10^{-4}`. .. GENERATED FROM PYTHON SOURCE LINES 230-241 .. code-block:: Python logNormalFactoryWithZeroLocation = ot.MaximumLikelihoodFactory(standardLogNormal) logNormalFactoryWithZeroLocation.setKnownParameter([0.0], [2]) lowerBound = [-5.0, 1.0e-4] upperBound = [5.0, -1.0] finiteLowerBound = [True, True] finiteUpperBound = [True, False] bounds = ot.Interval(lowerBound, upperBound, finiteLowerBound, finiteUpperBound) logNormalFactoryWithZeroLocation.setOptimizationBounds(bounds) fittedDistribution = logNormalFactoryWithZeroLocation.build(sample) print(" fittedDistribution = ", fittedDistribution) .. rst-class:: sphx-glr-script-out .. code-block:: none fittedDistribution = LogNormal(muLog = -0.045922, sigmaLog = 1.01483, gamma = 0) .. GENERATED FROM PYTHON SOURCE LINES 242-244 See the list of solvers compatible with a given MLE problem ----------------------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 246-253 In the next script, we perform a loop over the algorithms from the :class:`~openturns.NLopt` class and select the algorithms which can solve the problem. This is done using `try/except` statements. Moreover, some solvers pretend to solve the problem, but do not make any change to the parameters. In this case, we do not select them. .. GENERATED FROM PYTHON SOURCE LINES 255-277 .. code-block:: Python defaultParameter = standardLogNormal.getParameter() defaultSolver = logNormalFactoryWithZeroLocation.getOptimizationAlgorithm() startingPoint = defaultSolver.getStartingPoint() print("Default solver = ", defaultSolver.getImplementation().getClassName()) availableSolverList = [] differenceThreshold = 1.0e-8 for algorithmName in ot.NLopt.GetAlgorithmNames(): print("- ", algorithmName) solver = ot.NLopt(algorithmName) solver.setStartingPoint(startingPoint) logNormalFactoryWithZeroLocation.setOptimizationAlgorithm(solver) try: fittedDistribution = logNormalFactoryWithZeroLocation.build(sample) print(" fittedDistribution = ", fittedDistribution) parameter = fittedDistribution.getParameter() differenceOfParameters = parameter - defaultParameter if differenceOfParameters.norm() > differenceThreshold: # Consider only solvers which make a difference after optimization availableSolverList.append(algorithmName) except TypeError: print(" Fail : Impossible to build with ", algorithmName) .. rst-class:: sphx-glr-script-out .. code-block:: none Default solver = NLopt - AUGLAG fittedDistribution = LogNormal(muLog = -0.0458403, sigmaLog = 1.01519, gamma = 0) - AUGLAG_EQ fittedDistribution = LogNormal(muLog = -0.0458403, sigmaLog = 1.01519, gamma = 0) - GD_MLSL Fail : Impossible to build with GD_MLSL - GD_MLSL_LDS Fail : Impossible to build with GD_MLSL_LDS - GN_CRS2_LM Fail : Impossible to build with GN_CRS2_LM - GN_DIRECT Fail : Impossible to build with GN_DIRECT - GN_DIRECT_L Fail : Impossible to build with GN_DIRECT_L - GN_DIRECT_L_NOSCAL Fail : Impossible to build with GN_DIRECT_L_NOSCAL - GN_DIRECT_L_RAND Fail : Impossible to build with GN_DIRECT_L_RAND - GN_DIRECT_L_RAND_NOSCAL Fail : Impossible to build with GN_DIRECT_L_RAND_NOSCAL - GN_DIRECT_NOSCAL Fail : Impossible to build with GN_DIRECT_NOSCAL - GN_ESCH Fail : Impossible to build with GN_ESCH - GN_ISRES Fail : Impossible to build with GN_ISRES - GN_MLSL Fail : Impossible to build with GN_MLSL - GN_MLSL_LDS Fail : Impossible to build with GN_MLSL_LDS - GN_ORIG_DIRECT Fail : Impossible to build with GN_ORIG_DIRECT - GN_ORIG_DIRECT_L Fail : Impossible to build with GN_ORIG_DIRECT_L - G_MLSL Fail : Impossible to build with G_MLSL - G_MLSL_LDS Fail : Impossible to build with G_MLSL_LDS - LD_AUGLAG fittedDistribution = LogNormal(muLog = -0.0458403, sigmaLog = 1.01519, gamma = 0) - LD_AUGLAG_EQ fittedDistribution = LogNormal(muLog = -0.0458403, sigmaLog = 1.01519, gamma = 0) - LD_CCSAQ fittedDistribution = LogNormal(muLog = -0.0459469, sigmaLog = 1.01487, gamma = 0) - LD_LBFGS fittedDistribution = LogNormal(muLog = -0.0459271, sigmaLog = 1.01483, gamma = 0) - LD_MMA fittedDistribution = LogNormal(muLog = -0.0458403, sigmaLog = 1.01519, gamma = 0) - LD_SLSQP fittedDistribution = LogNormal(muLog = -0.0459352, sigmaLog = 1.01484, gamma = 0) - LD_TNEWTON fittedDistribution = LogNormal(muLog = -0.0459245, sigmaLog = 1.01483, gamma = 0) - LD_TNEWTON_PRECOND fittedDistribution = LogNormal(muLog = -0.0459245, sigmaLog = 1.01483, gamma = 0) - LD_TNEWTON_PRECOND_RESTART fittedDistribution = LogNormal(muLog = -0.0459245, sigmaLog = 1.01483, gamma = 0) - LD_TNEWTON_RESTART fittedDistribution = LogNormal(muLog = -0.0459245, sigmaLog = 1.01483, gamma = 0) - LD_VAR1 fittedDistribution = LogNormal(muLog = -0.0459263, sigmaLog = 1.01483, gamma = 0) - LD_VAR2 fittedDistribution = LogNormal(muLog = -0.0459263, sigmaLog = 1.01483, gamma = 0) - LN_AUGLAG fittedDistribution = LogNormal(muLog = -0.0458403, sigmaLog = 1.01519, gamma = 0) - LN_AUGLAG_EQ fittedDistribution = LogNormal(muLog = -0.0458403, sigmaLog = 1.01519, gamma = 0) - LN_BOBYQA fittedDistribution = LogNormal(muLog = -0.0463382, sigmaLog = 1.01415, gamma = 0) - LN_COBYLA fittedDistribution = LogNormal(muLog = -0.0455552, sigmaLog = 1.01475, gamma = 0) - LN_NELDERMEAD fittedDistribution = LogNormal(muLog = -0.047843, sigmaLog = 1.01593, gamma = 0) - LN_NEWUOA Fail : Impossible to build with LN_NEWUOA - LN_SBPLX fittedDistribution = LogNormal(muLog = -0.0465393, sigmaLog = 1.01508, gamma = 0) .. GENERATED FROM PYTHON SOURCE LINES 278-282 .. code-block:: Python print("Available solvers = ", len(availableSolverList)) for name in availableSolverList: print("- ", name) .. rst-class:: sphx-glr-script-out .. code-block:: none Available solvers = 20 - AUGLAG - AUGLAG_EQ - LD_AUGLAG - LD_AUGLAG_EQ - LD_CCSAQ - LD_LBFGS - LD_MMA - LD_SLSQP - LD_TNEWTON - LD_TNEWTON_PRECOND - LD_TNEWTON_PRECOND_RESTART - LD_TNEWTON_RESTART - LD_VAR1 - LD_VAR2 - LN_AUGLAG - LN_AUGLAG_EQ - LN_BOBYQA - LN_COBYLA - LN_NELDERMEAD - LN_SBPLX .. GENERATED FROM PYTHON SOURCE LINES 283-284 Pick one and try it. .. GENERATED FROM PYTHON SOURCE LINES 286-294 .. code-block:: Python algorithmName = "LD_SLSQP" solver = ot.NLopt(algorithmName) print(" New solver = ", solver.getClassName(), solver.getAlgorithmName()) solver.setStartingPoint(startingPoint) logNormalFactoryWithZeroLocation.setOptimizationAlgorithm(solver) fittedDistribution = logNormalFactoryWithZeroLocation.build(sample) print(" fittedDistribution = ", fittedDistribution) .. rst-class:: sphx-glr-script-out .. code-block:: none New solver = NLopt LD_SLSQP fittedDistribution = LogNormal(muLog = -0.0459352, sigmaLog = 1.01484, gamma = 0) .. GENERATED FROM PYTHON SOURCE LINES 295-296 Reset altered entries .. GENERATED FROM PYTHON SOURCE LINES 296-297 .. code-block:: Python ot.ResourceMap.Reload() .. _sphx_glr_download_auto_data_analysis_distribution_fitting_plot_advanced_mle_estimator.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_advanced_mle_estimator.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_advanced_mle_estimator.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_advanced_mle_estimator.zip `