Note
Go to the end to download the full example code.
Fitting a distribution with customized maximum likelihood¶
Introduction¶
When we perform distribution fitting using the MaximumLikelihoodFactory
class, the default optimization solver sometimes fails to
maximize the likelihood.
This might be because the optimization solver is not appropriate in
this particular case, or because the bounds of the problem
are not properly set.
In this example, we configure the optimization solver by two different
methods.
The first method sets a specific solver among the NLopt solvers. Moreover, we customize the distribution parameters bounds. This helps the solver to avoid the part of the domain where the distribution cannot be built.
The second method sets the
ResourceMap
key that is used by theMaximumLikelihoodFactory
in order to select the default solver.
import openturns as ot
Simulating a sample¶
In this example, we consider the TruncatedNormal
distribution
with default parameters and generate a large sample from it.
distribution = ot.TruncatedNormal()
print("Distribution=", distribution)
size = 10000
sample = distribution.getSample(size)
Distribution= TruncatedNormal(mu = 0, sigma = 1, a = -1, b = 1)
We can use the TruncatedNormalFactory
class.
factory = ot.TruncatedNormalFactory()
fittedDistribution = factory.build(sample)
print("Fitted distribution=", fittedDistribution)
Fitted distribution= TruncatedNormal(mu = -0.0087698, sigma = 0.984923, a = -0.999836, b = 1)
But this method sometimes fails for specific distributions, for example when the sample size is very small or when the distribution has specific properties (e.g. some parameter is known beforehand). In these cases, general-purpose methods cannot be used, and the methods presented below may be relevant.
Defining the estimation problem¶
The truncated normal distribution has 4 parameters:
the mean and standard deviation parameters of the underlying Normal distribution,
the minimum and maximum of the underlying truncated distribution.
Estimating the parameters will be easier if two conditions are met:
the starting point (i.e., the initial values of the parameters) of the optimization is compatible with the sample,
the bounds are compatible with the distribution.
First, we define the starting point of the optimization, using statistics from the sample.
sample_min = sample.getMin()[0]
sample_max = sample.getMax()[0]
sample_mean = sample.computeMean()[0]
sigma_hat = sample.computeStandardDeviation()[0]
startingPoint = [sample_mean, sigma_hat, sample_min, sample_max]
print("startingPoint=", startingPoint)
startingPoint= [-0.002562574711278147, 0.5384068806033934, -0.9997360092906988, 0.9999009268469972]
Secondly, we set the bounds of the parameters. We assume that the minimum and maximum cannot be lower than -1 or greater than 1. The parameters have the following bounds:
;
;
;
.
But this may not give satisfactory results, because infinite parameters are impossible and a zero standard deviation is infeasible. Furthermore, we must have . We assume that we know that the mean parameter cannot be greater than 10 and lower than -10. Similarly, the standard deviation cannot be greater than 10. Hence, we use the following bounds:
;
;
;
where is the sample mean and is a small parameter.
epsilon = 1.0e-3
bounds_lower = [-10.0, epsilon, -1.0 - epsilon, sample_mean + epsilon]
bounds_upper = [10.0, 10.0, sample_mean - epsilon, 1.0 + epsilon]
print("bounds_lower=", bounds_lower)
print("bounds_upper=", bounds_upper)
bounds_lower= [-10.0, 0.001, -1.001, -0.0015625747112781468]
bounds_upper= [10.0, 10.0, -0.003562574711278147, 1.001]
The boolean False means unbounded and True means bounded.
finiteLowerBound = [True] * 4
finiteUpperBound = [True] * 4
interval = ot.Interval(bounds_lower, bounds_upper, finiteLowerBound, finiteUpperBound)
Setting the solver and bounds¶
The two methods that we suggest are based on the MaximumLikelihoodFactory
class
which uses the maximum likelihood estimator to estimate
the parameters of the distribution depending on the sample.
The first method is to use the setOptimizationAlgorithm()
method.
This sets the optimization algorithm as a replacement for the
default optimization solver.
def printTruncatedNormalParameters(truncatedNormalDistribution):
mu, sigma, a, b = truncatedNormalDistribution.getParameter()
print(" ", truncatedNormalDistribution.getImplementation().getClassName())
print(" mu= %.3f" % (mu))
print(" sigma= %.4f" % (sigma))
print(" a= %.2f" % (a))
print(" b= %.2f" % (b))
return None
factory = ot.MaximumLikelihoodFactory(distribution)
factory.setOptimizationBounds(interval)
distribution_MLE = factory.build(sample)
print("Fitted distribution with default solver:")
printTruncatedNormalParameters(distribution_MLE)
default_solver = factory.getOptimizationAlgorithm()
print("Default solver=", default_solver.getImplementation().getClassName())
solver = ot.NLopt("LN_COBYLA")
print("New solver=", solver.getClassName(), solver.getAlgorithmName())
solver.setStartingPoint(startingPoint)
factory.setOptimizationAlgorithm(solver)
print(
"Updated solver=",
factory.getOptimizationAlgorithm().getImplementation().getClassName(),
)
distribution_MLE = factory.build(sample)
print("Fitted distribution with new solver:")
printTruncatedNormalParameters(distribution_MLE)
Fitted distribution with default solver:
TruncatedNormal
mu= -0.000
sigma= 1.0000
a= -1.00
b= 1.00
Default solver= TNC
New solver= NLopt LN_COBYLA
Updated solver= NLopt
Fitted distribution with new solver:
TruncatedNormal
mu= -0.003
sigma= 0.5384
a= -1.00
b= 1.00
Using ResourceMap to set the solver¶
Another method is to use a special key of the ResourceMap
,
which defines the default solver used by MaximumLikelihoodFactory
.
In this case, we do not define the starting point.
ot.ResourceMap.SetAsString(
"MaximumLikelihoodFactory-DefaultOptimizationAlgorithm", "LN_COBYLA"
)
ot.ResourceMap.SetAsUnsignedInteger(
"MaximumLikelihoodFactory-MaximumCallsNumber", 100000
)
factory = ot.MaximumLikelihoodFactory(distribution)
factory.setOptimizationBounds(interval)
print(
"Default solver=",
factory.getOptimizationAlgorithm().getImplementation().getClassName(),
)
distribution_MLE = factory.build(sample)
print("Fitted distribution:")
printTruncatedNormalParameters(distribution_MLE)
Default solver= NLopt
Fitted distribution:
TruncatedNormal
mu= -0.009
sigma= 0.9852
a= -1.00
b= 1.00
Fitting a LogNormal with zero location parameter¶
We now consider the LogNormal
distribution
and present two different topics.
We show how to estimate the parameters of a LogNormal distribution which has a zero location parameter.
We compute the list of solvers which are compatible with a specific MLE problem.
We generate a large sample from the standard LogNormal distribution.
standardLogNormal = ot.LogNormal()
sample = standardLogNormal.getSample(1000)
If we estimate the parameters using LogNormalFactory
,
the algorithm does not know that the true value of the location parameter
is zero.
Hence, the estimator may not necessarily lead to an estimated parameter
exactly equal to zero.
In this example, we assume that we know that this parameter is zero.
In this case, the simplest method is to use the
setKnownParameter()
method.
Furthermore, we know that the absolute value of the mean of the underlying normal
distribution cannot be greater than 5.
Finally, we know that the standard deviation of the underlying
normal distribution cannot be lower than, say, .
logNormalFactoryWithZeroLocation = ot.MaximumLikelihoodFactory(standardLogNormal)
logNormalFactoryWithZeroLocation.setKnownParameter([0.0], [2])
lowerBound = [-5.0, 1.0e-4]
upperBound = [5.0, -1.0]
finiteLowerBound = [True, True]
finiteUpperBound = [True, False]
bounds = ot.Interval(lowerBound, upperBound, finiteLowerBound, finiteUpperBound)
logNormalFactoryWithZeroLocation.setOptimizationBounds(bounds)
fittedDistribution = logNormalFactoryWithZeroLocation.build(sample)
print(" fittedDistribution = ", fittedDistribution)
fittedDistribution = LogNormal(muLog = 0.0158879, sigmaLog = 1.00312, gamma = 0)
See the list of solvers compatible with a given MLE problem¶
In the next script, we perform a loop over the algorithms
from the NLopt
class and select the algorithms
which can solve the problem.
This is done using try/except statements.
Moreover, some solvers pretend to solve the problem, but do
not make any change to the parameters.
In this case, we do not select them.
defaultParameter = standardLogNormal.getParameter()
defaultSolver = logNormalFactoryWithZeroLocation.getOptimizationAlgorithm()
startingPoint = defaultSolver.getStartingPoint()
print("Default solver = ", defaultSolver.getImplementation().getClassName())
availableSolverList = []
differenceThreshold = 1.0e-8
for algorithmName in ot.NLopt.GetAlgorithmNames():
print("- ", algorithmName)
solver = ot.NLopt(algorithmName)
solver.setStartingPoint(startingPoint)
logNormalFactoryWithZeroLocation.setOptimizationAlgorithm(solver)
try:
fittedDistribution = logNormalFactoryWithZeroLocation.build(sample)
print(" fittedDistribution = ", fittedDistribution)
parameter = fittedDistribution.getParameter()
differenceOfParameters = parameter - defaultParameter
if differenceOfParameters.norm() > differenceThreshold:
# Consider only solvers which make a difference after optimization
availableSolverList.append(algorithmName)
except TypeError:
print(" Fail : Impossible to build with ", algorithmName)
Default solver = NLopt
- AUGLAG
fittedDistribution = LogNormal(muLog = 0.0165361, sigmaLog = 1.00058, gamma = 0)
- AUGLAG_EQ
fittedDistribution = LogNormal(muLog = 0.0165361, sigmaLog = 1.00058, gamma = 0)
- GD_MLSL
Fail : Impossible to build with GD_MLSL
- GD_MLSL_LDS
Fail : Impossible to build with GD_MLSL_LDS
- GN_CRS2_LM
Fail : Impossible to build with GN_CRS2_LM
- GN_DIRECT
Fail : Impossible to build with GN_DIRECT
- GN_DIRECT_L
Fail : Impossible to build with GN_DIRECT_L
- GN_DIRECT_L_NOSCAL
Fail : Impossible to build with GN_DIRECT_L_NOSCAL
- GN_DIRECT_L_RAND
Fail : Impossible to build with GN_DIRECT_L_RAND
- GN_DIRECT_L_RAND_NOSCAL
Fail : Impossible to build with GN_DIRECT_L_RAND_NOSCAL
- GN_DIRECT_NOSCAL
Fail : Impossible to build with GN_DIRECT_NOSCAL
- GN_ESCH
Fail : Impossible to build with GN_ESCH
- GN_ISRES
Fail : Impossible to build with GN_ISRES
- GN_MLSL
Fail : Impossible to build with GN_MLSL
- GN_MLSL_LDS
Fail : Impossible to build with GN_MLSL_LDS
- GN_ORIG_DIRECT
Fail : Impossible to build with GN_ORIG_DIRECT
- GN_ORIG_DIRECT_L
Fail : Impossible to build with GN_ORIG_DIRECT_L
- G_MLSL
Fail : Impossible to build with G_MLSL
- G_MLSL_LDS
Fail : Impossible to build with G_MLSL_LDS
- LD_AUGLAG
fittedDistribution = LogNormal(muLog = 0.0165361, sigmaLog = 1.00058, gamma = 0)
- LD_AUGLAG_EQ
fittedDistribution = LogNormal(muLog = 0.0165361, sigmaLog = 1.00058, gamma = 0)
- LD_CCSAQ
fittedDistribution = LogNormal(muLog = 0.0165295, sigmaLog = 1.00058, gamma = 0)
- LD_LBFGS
fittedDistribution = LogNormal(muLog = 0.0158819, sigmaLog = 1.00312, gamma = 0)
- LD_MMA
fittedDistribution = LogNormal(muLog = 0.0165361, sigmaLog = 1.00058, gamma = 0)
- LD_SLSQP
fittedDistribution = LogNormal(muLog = 0.0174256, sigmaLog = 1.00132, gamma = 0)
- LD_TNEWTON
fittedDistribution = LogNormal(muLog = 0.0158698, sigmaLog = 1.00311, gamma = 0)
- LD_TNEWTON_PRECOND
fittedDistribution = LogNormal(muLog = 0.0158698, sigmaLog = 1.00311, gamma = 0)
- LD_TNEWTON_PRECOND_RESTART
fittedDistribution = LogNormal(muLog = 0.0158698, sigmaLog = 1.00311, gamma = 0)
- LD_TNEWTON_RESTART
fittedDistribution = LogNormal(muLog = 0.0158698, sigmaLog = 1.00311, gamma = 0)
- LD_VAR1
fittedDistribution = LogNormal(muLog = 0.0160269, sigmaLog = 1.00314, gamma = 0)
- LD_VAR2
fittedDistribution = LogNormal(muLog = 0.0160269, sigmaLog = 1.00314, gamma = 0)
- LN_AUGLAG
fittedDistribution = LogNormal(muLog = 0.0165361, sigmaLog = 1.00058, gamma = 0)
- LN_AUGLAG_EQ
fittedDistribution = LogNormal(muLog = 0.0165361, sigmaLog = 1.00058, gamma = 0)
- LN_BOBYQA
fittedDistribution = LogNormal(muLog = 0.0153223, sigmaLog = 1.00494, gamma = 0)
- LN_COBYLA
fittedDistribution = LogNormal(muLog = 0.0183524, sigmaLog = 1.00357, gamma = 0)
- LN_NELDERMEAD
fittedDistribution = LogNormal(muLog = 0.0128211, sigmaLog = 1.00282, gamma = 0)
- LN_NEWUOA
Fail : Impossible to build with LN_NEWUOA
- LN_SBPLX
fittedDistribution = LogNormal(muLog = 0.0143433, sigmaLog = 1.00252, gamma = 0)
print("Available solvers = ", len(availableSolverList))
for name in availableSolverList:
print("- ", name)
Available solvers = 20
- AUGLAG
- AUGLAG_EQ
- LD_AUGLAG
- LD_AUGLAG_EQ
- LD_CCSAQ
- LD_LBFGS
- LD_MMA
- LD_SLSQP
- LD_TNEWTON
- LD_TNEWTON_PRECOND
- LD_TNEWTON_PRECOND_RESTART
- LD_TNEWTON_RESTART
- LD_VAR1
- LD_VAR2
- LN_AUGLAG
- LN_AUGLAG_EQ
- LN_BOBYQA
- LN_COBYLA
- LN_NELDERMEAD
- LN_SBPLX
Pick one and try it.
algorithmName = "LD_SLSQP"
solver = ot.NLopt(algorithmName)
print(" New solver = ", solver.getClassName(), solver.getAlgorithmName())
solver.setStartingPoint(startingPoint)
logNormalFactoryWithZeroLocation.setOptimizationAlgorithm(solver)
fittedDistribution = logNormalFactoryWithZeroLocation.build(sample)
print(" fittedDistribution = ", fittedDistribution)
New solver = NLopt LD_SLSQP
fittedDistribution = LogNormal(muLog = 0.0174256, sigmaLog = 1.00132, gamma = 0)