.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_data_analysis/distribution_fitting/plot_advanced_mle_estimator.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_data_analysis_distribution_fitting_plot_advanced_mle_estimator.py: Fitting a distribution with customized maximum likelihood ========================================================= .. GENERATED FROM PYTHON SOURCE LINES 6-25 Introduction ------------ When we perform distribution fitting using the :class:`~openturns.MaximumLikelihoodFactory` class, the default optimization solver sometimes fails to maximize the likelihood. This might be because the optimization solver is not appropriate in this particular case, or because the bounds of the problem are not properly set. In this example, we configure the optimization solver by two different methods. - The first method sets a specific solver among the NLopt solvers. Moreover, we customize the distribution parameters bounds. This helps the solver to avoid the part of the domain where the distribution cannot be built. - The second method sets the :class:`~openturns.ResourceMap` key that is used by the :class:`~openturns.MaximumLikelihoodFactory` in order to select the default solver. .. GENERATED FROM PYTHON SOURCE LINES 25-28 .. code-block:: Python import openturns as ot .. GENERATED FROM PYTHON SOURCE LINES 29-31 Simulating a sample ------------------- .. GENERATED FROM PYTHON SOURCE LINES 33-35 In this example, we consider the :class:`~openturns.TruncatedNormal` distribution with default parameters and generate a large sample from it. .. GENERATED FROM PYTHON SOURCE LINES 35-40 .. code-block:: Python distribution = ot.TruncatedNormal() print("Distribution=", distribution) size = 10000 sample = distribution.getSample(size) .. rst-class:: sphx-glr-script-out .. code-block:: none Distribution= TruncatedNormal(mu = 0, sigma = 1, a = -1, b = 1) .. GENERATED FROM PYTHON SOURCE LINES 41-42 We can use the :class:`TruncatedNormalFactory` class. .. GENERATED FROM PYTHON SOURCE LINES 42-46 .. code-block:: Python factory = ot.TruncatedNormalFactory() fittedDistribution = factory.build(sample) print("Fitted distribution=", fittedDistribution) .. rst-class:: sphx-glr-script-out .. code-block:: none Fitted distribution= TruncatedNormal(mu = -0.0087698, sigma = 0.984923, a = -0.999836, b = 1) .. GENERATED FROM PYTHON SOURCE LINES 47-52 But this method sometimes fails for specific distributions, for example when the sample size is very small or when the distribution has specific properties (e.g. some parameter is known beforehand). In these cases, general-purpose methods cannot be used, and the methods presented below may be relevant. .. GENERATED FROM PYTHON SOURCE LINES 54-56 Defining the estimation problem ------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 58-72 The truncated normal distribution has 4 parameters: - the mean and standard deviation parameters of the underlying Normal distribution, - the minimum and maximum of the underlying truncated distribution. Estimating the parameters will be easier if two conditions are met: - the starting point (i.e., the initial values of the parameters) of the optimization is compatible with the sample, - the bounds are compatible with the distribution. First, we define the starting point of the optimization, using statistics from the sample. .. GENERATED FROM PYTHON SOURCE LINES 74-81 .. code-block:: Python sample_min = sample.getMin()[0] sample_max = sample.getMax()[0] sample_mean = sample.computeMean()[0] sigma_hat = sample.computeStandardDeviation()[0] startingPoint = [sample_mean, sigma_hat, sample_min, sample_max] print("startingPoint=", startingPoint) .. rst-class:: sphx-glr-script-out .. code-block:: none startingPoint= [-0.002562574711278147, 0.5384068806033934, -0.9997360092906988, 0.9999009268469972] .. GENERATED FROM PYTHON SOURCE LINES 82-107 Secondly, we set the bounds of the parameters. We assume that the minimum and maximum cannot be lower than -1 or greater than 1. The parameters have the following bounds: - :math:`\mu \in [-\infty, +\infty]` ; - :math:`\sigma \in [0, +\infty]` ; - :math:`a \in [-1, 1]` ; - :math:`b \in [-1, 1]`. But this may not give satisfactory results, because infinite parameters are impossible and a zero standard deviation is infeasible. Furthermore, we must have :math:`a < b`. We assume that we know that the mean parameter cannot be greater than 10 and lower than -10. Similarly, the standard deviation cannot be greater than 10. Hence, we use the following bounds: - :math:`\mu \in [-10, +10]` ; - :math:`\sigma \in [\epsilon, +10]` ; - :math:`a \in [-1 - \epsilon, \bar{x} - \epsilon]` ; - :math:`b \in [\bar{x} + \epsilon, 1 + \epsilon]` where :math:`\bar{x}` is the sample mean and :math:`\epsilon` is a small parameter. .. GENERATED FROM PYTHON SOURCE LINES 109-115 .. code-block:: Python epsilon = 1.0e-3 bounds_lower = [-10.0, epsilon, -1.0 - epsilon, sample_mean + epsilon] bounds_upper = [10.0, 10.0, sample_mean - epsilon, 1.0 + epsilon] print("bounds_lower=", bounds_lower) print("bounds_upper=", bounds_upper) .. rst-class:: sphx-glr-script-out .. code-block:: none bounds_lower= [-10.0, 0.001, -1.001, -0.0015625747112781468] bounds_upper= [10.0, 10.0, -0.003562574711278147, 1.001] .. GENERATED FROM PYTHON SOURCE LINES 116-117 The boolean `False` means unbounded and `True` means bounded. .. GENERATED FROM PYTHON SOURCE LINES 119-123 .. code-block:: Python finiteLowerBound = [True] * 4 finiteUpperBound = [True] * 4 interval = ot.Interval(bounds_lower, bounds_upper, finiteLowerBound, finiteUpperBound) .. GENERATED FROM PYTHON SOURCE LINES 124-126 Setting the solver and bounds ----------------------------- .. GENERATED FROM PYTHON SOURCE LINES 128-135 The two methods that we suggest are based on the :class:`~openturns.MaximumLikelihoodFactory` class which uses the maximum likelihood estimator to estimate the parameters of the distribution depending on the sample. The first method is to use the :meth:`~openturns.MaximumLikelihoodFactory.setOptimizationAlgorithm` method. This sets the optimization algorithm as a replacement for the default optimization solver. .. GENERATED FROM PYTHON SOURCE LINES 135-147 .. code-block:: Python def printTruncatedNormalParameters(truncatedNormalDistribution): mu, sigma, a, b = truncatedNormalDistribution.getParameter() print(" ", truncatedNormalDistribution.getImplementation().getClassName()) print(" mu= %.3f" % (mu)) print(" sigma= %.4f" % (sigma)) print(" a= %.2f" % (a)) print(" b= %.2f" % (b)) return None .. GENERATED FROM PYTHON SOURCE LINES 148-168 .. code-block:: Python factory = ot.MaximumLikelihoodFactory(distribution) factory.setOptimizationBounds(interval) distribution_MLE = factory.build(sample) print("Fitted distribution with default solver:") printTruncatedNormalParameters(distribution_MLE) default_solver = factory.getOptimizationAlgorithm() print("Default solver=", default_solver.getImplementation().getClassName()) solver = ot.NLopt("LN_COBYLA") print("New solver=", solver.getClassName(), solver.getAlgorithmName()) solver.setStartingPoint(startingPoint) factory.setOptimizationAlgorithm(solver) print( "Updated solver=", factory.getOptimizationAlgorithm().getImplementation().getClassName(), ) distribution_MLE = factory.build(sample) print("Fitted distribution with new solver:") printTruncatedNormalParameters(distribution_MLE) .. rst-class:: sphx-glr-script-out .. code-block:: none Fitted distribution with default solver: TruncatedNormal mu= -0.000 sigma= 1.0000 a= -1.00 b= 1.00 Default solver= TNC New solver= NLopt LN_COBYLA Updated solver= NLopt Fitted distribution with new solver: TruncatedNormal mu= -0.016 sigma= 0.5526 a= -1.00 b= 1.00 .. GENERATED FROM PYTHON SOURCE LINES 169-171 Using ResourceMap to set the solver ----------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 173-176 Another method is to use a special key of the :class:`ResourceMap`, which defines the default solver used by :class:`~openturns.MaximumLikelihoodFactory`. In this case, we do not define the starting point. .. GENERATED FROM PYTHON SOURCE LINES 178-191 .. code-block:: Python ot.ResourceMap.SetAsString( "MaximumLikelihoodFactory-DefaultOptimizationAlgorithm", "LN_COBYLA" ) factory = ot.MaximumLikelihoodFactory(distribution) factory.setOptimizationBounds(interval) print( "Default solver=", factory.getOptimizationAlgorithm().getImplementation().getClassName(), ) distribution_MLE = factory.build(sample) print("Fitted distribution:") printTruncatedNormalParameters(distribution_MLE) .. rst-class:: sphx-glr-script-out .. code-block:: none Default solver= NLopt Fitted distribution: TruncatedNormal mu= -0.009 sigma= 1.0016 a= -1.00 b= 1.00 .. GENERATED FROM PYTHON SOURCE LINES 192-194 Fitting a LogNormal with zero location parameter ------------------------------------------------ .. GENERATED FROM PYTHON SOURCE LINES 196-203 We now consider the :class:`~openturns.LogNormal` distribution and present two different topics. - We show how to estimate the parameters of a LogNormal distribution which has a zero location parameter. - We compute the list of solvers which are compatible with a specific MLE problem. .. GENERATED FROM PYTHON SOURCE LINES 205-206 We generate a large sample from the standard LogNormal distribution. .. GENERATED FROM PYTHON SOURCE LINES 208-211 .. code-block:: Python standardLogNormal = ot.LogNormal() sample = standardLogNormal.getSample(1000) .. GENERATED FROM PYTHON SOURCE LINES 212-224 If we estimate the parameters using :class:`~openturns.LogNormalFactory`, the algorithm does not know that the true value of the location parameter :math:`\gamma` is zero. Hence, the estimator may not necessarily lead to an estimated parameter exactly equal to zero. In this example, we assume that we know that this parameter is zero. In this case, the simplest method is to use the :meth:`~openturns.MaximumLikelihoodFactory.setKnownParameter` method. Furthermore, we know that the absolute value of the mean of the underlying normal distribution cannot be greater than 5. Finally, we know that the standard deviation of the underlying normal distribution cannot be lower than, say, :math:`\epsilon = 10^{-4}`. .. GENERATED FROM PYTHON SOURCE LINES 226-237 .. code-block:: Python logNormalFactoryWithZeroLocation = ot.MaximumLikelihoodFactory(standardLogNormal) logNormalFactoryWithZeroLocation.setKnownParameter([0.0], [2]) lowerBound = [-5.0, 1.0e-4] upperBound = [5.0, -1.0] finiteLowerBound = [True, True] finiteUpperBound = [True, False] bounds = ot.Interval(lowerBound, upperBound, finiteLowerBound, finiteUpperBound) logNormalFactoryWithZeroLocation.setOptimizationBounds(bounds) fittedDistribution = logNormalFactoryWithZeroLocation.build(sample) print(" fittedDistribution = ", fittedDistribution) .. rst-class:: sphx-glr-script-out .. code-block:: none fittedDistribution = LogNormal(muLog = 0.00970194, sigmaLog = 1.00033, gamma = 0) .. GENERATED FROM PYTHON SOURCE LINES 238-240 See the list of solvers compatible with a given MLE problem ----------------------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 242-249 In the next script, we perform a loop over the algorithms from the :class:`NLopt` class and select the algorithms which can solve the problem. This is done using `try/except` statements. Moreover, some solvers pretend to solve the problem, but do not make any change to the parameters. In this case, we do not select them. .. GENERATED FROM PYTHON SOURCE LINES 251-273 .. code-block:: Python defaultParameter = standardLogNormal.getParameter() defaultSolver = logNormalFactoryWithZeroLocation.getOptimizationAlgorithm() startingPoint = defaultSolver.getStartingPoint() print("Default solver = ", defaultSolver.getImplementation().getClassName()) availableSolverList = [] differenceThreshold = 1.0e-8 for algorithmName in ot.NLopt.GetAlgorithmNames(): print("- ", algorithmName) solver = ot.NLopt(algorithmName) solver.setStartingPoint(startingPoint) logNormalFactoryWithZeroLocation.setOptimizationAlgorithm(solver) try: fittedDistribution = logNormalFactoryWithZeroLocation.build(sample) print(" fittedDistribution = ", fittedDistribution) parameter = fittedDistribution.getParameter() differenceOfParameters = parameter - defaultParameter if differenceOfParameters.norm() > differenceThreshold: # Consider only solvers which make a difference after optimization availableSolverList.append(algorithmName) except TypeError: print(" Fail : Impossible to build with ", algorithmName) .. rst-class:: sphx-glr-script-out .. code-block:: none Default solver = NLopt - AUGLAG fittedDistribution = LogNormal(muLog = 0, sigmaLog = 1, gamma = 0) - AUGLAG_EQ fittedDistribution = LogNormal(muLog = 0, sigmaLog = 1, gamma = 0) - GD_MLSL Fail : Impossible to build with GD_MLSL - GD_MLSL_LDS Fail : Impossible to build with GD_MLSL_LDS - GN_CRS2_LM Fail : Impossible to build with GN_CRS2_LM - GN_DIRECT Fail : Impossible to build with GN_DIRECT - GN_DIRECT_L Fail : Impossible to build with GN_DIRECT_L - GN_DIRECT_L_NOSCAL Fail : Impossible to build with GN_DIRECT_L_NOSCAL - GN_DIRECT_L_RAND Fail : Impossible to build with GN_DIRECT_L_RAND - GN_DIRECT_L_RAND_NOSCAL Fail : Impossible to build with GN_DIRECT_L_RAND_NOSCAL - GN_DIRECT_NOSCAL Fail : Impossible to build with GN_DIRECT_NOSCAL - GN_ESCH Fail : Impossible to build with GN_ESCH - GN_ISRES Fail : Impossible to build with GN_ISRES - GN_MLSL Fail : Impossible to build with GN_MLSL - GN_MLSL_LDS Fail : Impossible to build with GN_MLSL_LDS - GN_ORIG_DIRECT Fail : Impossible to build with GN_ORIG_DIRECT - GN_ORIG_DIRECT_L Fail : Impossible to build with GN_ORIG_DIRECT_L - G_MLSL Fail : Impossible to build with G_MLSL - G_MLSL_LDS Fail : Impossible to build with G_MLSL_LDS - LD_AUGLAG fittedDistribution = LogNormal(muLog = 0, sigmaLog = 1, gamma = 0) - LD_AUGLAG_EQ fittedDistribution = LogNormal(muLog = 0, sigmaLog = 1, gamma = 0) - LD_CCSAQ fittedDistribution = LogNormal(muLog = 0, sigmaLog = 1, gamma = 0) - LD_LBFGS fittedDistribution = LogNormal(muLog = 0.0158819, sigmaLog = 1.00312, gamma = 0) - LD_MMA fittedDistribution = LogNormal(muLog = 0, sigmaLog = 1, gamma = 0) - LD_SLSQP fittedDistribution = LogNormal(muLog = 0.0174256, sigmaLog = 1.00132, gamma = 0) - LD_TNEWTON fittedDistribution = LogNormal(muLog = 0.0158698, sigmaLog = 1.00311, gamma = 0) - LD_TNEWTON_PRECOND fittedDistribution = LogNormal(muLog = 0.0158698, sigmaLog = 1.00311, gamma = 0) - LD_TNEWTON_PRECOND_RESTART fittedDistribution = LogNormal(muLog = 0.0158698, sigmaLog = 1.00311, gamma = 0) - LD_TNEWTON_RESTART fittedDistribution = LogNormal(muLog = 0.0158698, sigmaLog = 1.00311, gamma = 0) - LD_VAR1 fittedDistribution = LogNormal(muLog = 0.0160269, sigmaLog = 1.00314, gamma = 0) - LD_VAR2 fittedDistribution = LogNormal(muLog = 0.0160269, sigmaLog = 1.00314, gamma = 0) - LN_AUGLAG fittedDistribution = LogNormal(muLog = 0, sigmaLog = 1, gamma = 0) - LN_AUGLAG_EQ fittedDistribution = LogNormal(muLog = 0, sigmaLog = 1, gamma = 0) - LN_BOBYQA fittedDistribution = LogNormal(muLog = 0.0153223, sigmaLog = 1.00494, gamma = 0) - LN_COBYLA fittedDistribution = LogNormal(muLog = 0.00970194, sigmaLog = 1.00033, gamma = 0) - LN_NELDERMEAD fittedDistribution = LogNormal(muLog = 0.0128211, sigmaLog = 1.00282, gamma = 0) - LN_NEWUOA Fail : Impossible to build with LN_NEWUOA - LN_SBPLX fittedDistribution = LogNormal(muLog = 0.0143433, sigmaLog = 1.00252, gamma = 0) .. GENERATED FROM PYTHON SOURCE LINES 274-278 .. code-block:: Python print("Available solvers = ", len(availableSolverList)) for name in availableSolverList: print("- ", name) .. rst-class:: sphx-glr-script-out .. code-block:: none Available solvers = 12 - LD_LBFGS - LD_SLSQP - LD_TNEWTON - LD_TNEWTON_PRECOND - LD_TNEWTON_PRECOND_RESTART - LD_TNEWTON_RESTART - LD_VAR1 - LD_VAR2 - LN_BOBYQA - LN_COBYLA - LN_NELDERMEAD - LN_SBPLX .. GENERATED FROM PYTHON SOURCE LINES 279-280 Pick one and try it. .. GENERATED FROM PYTHON SOURCE LINES 282-289 .. code-block:: Python algorithmName = "LD_SLSQP" solver = ot.NLopt(algorithmName) print(" New solver = ", solver.getClassName(), solver.getAlgorithmName()) solver.setStartingPoint(startingPoint) logNormalFactoryWithZeroLocation.setOptimizationAlgorithm(solver) fittedDistribution = logNormalFactoryWithZeroLocation.build(sample) print(" fittedDistribution = ", fittedDistribution) .. rst-class:: sphx-glr-script-out .. code-block:: none New solver = NLopt LD_SLSQP fittedDistribution = LogNormal(muLog = 0.0174256, sigmaLog = 1.00132, gamma = 0) .. _sphx_glr_download_auto_data_analysis_distribution_fitting_plot_advanced_mle_estimator.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_advanced_mle_estimator.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_advanced_mle_estimator.py `