ParetoFactory

(Source code, svg)

../../_images/openturns-ParetoFactory-1.svg
class ParetoFactory(*args)

Pareto factory.

Methods

build(*args)

Build the distribution.

buildAsPareto(*args)

Estimate the distribution as native distribution.

buildEstimator(*args)

Build the distribution and the parameter distribution.

buildMethodOfLeastSquares(*args)

Method of least-squares.

buildMethodOfLikelihoodMaximization(sample)

Method of likelihood maximization.

buildMethodOfMoments(sample)

Method of moments estimator.

getBootstrapSize()

Accessor to the bootstrap size.

getClassName()

Accessor to the object's name.

getKnownParameterIndices()

Accessor to the known parameters indices.

getKnownParameterValues()

Accessor to the known parameters values.

getName()

Accessor to the object's name.

hasName()

Test if the object is named.

setBootstrapSize(bootstrapSize)

Accessor to the bootstrap size.

setKnownParameter(*args)

Accessor to the known parameters.

setName(name)

Accessor to the object's name.

Notes

Several estimators to build a Pareto distribution from a scalar sample are proposed. The default strategy is to use the least squares estimator. We make the assumption that (x_1, \dots, x_{\sampleSize}) is an i.i.d. sample from the Pareto random variable where \sampleSize is the sample size.

Moments based estimator:

Let us define the sample statistics required for the estimation. The empirical mean \overline{x} is calculated as:

\overline{x} = \frac{1}{\sampleSize} \sum_{i=1}^\sampleSize x_i

The associated empirical standard deviation s is:

s = \sqrt{\frac{1}{\sampleSize - 1} \sum_{i=1}^\sampleSize (x_i - \overline{x})^2}.

Finally, the distribution of the sample is characterized by its empirical skewness, denoted as \widehat{\text{skew}}.

The estimator \left(\widehat{\beta}, \widehat{\alpha}, \widehat{\gamma}\right) of (\beta, \alpha, \gamma) is defined as follows. The parameter \widehat{\alpha} is solution of the equation:

\widehat{\text{skew}} 
=  \dfrac{ 2(1 + \widehat{\alpha}) }{ \widehat{\alpha} - 3 } \sqrt{ \dfrac{ \widehat{\alpha} - 2 }{ \widehat{\alpha} } }.

The previous nonlinear equation is solved using a numerical method. If \widehat{\alpha} \leq 3, then an exception is raised. If \widehat{\alpha} > 3, then we compute (\widehat{\beta}, \widehat{\gamma}) as follows:

\widehat{\beta}
& = s (\widehat{\alpha} - 1) \sqrt{\dfrac{\widehat{\alpha} - 2}{\widehat{\alpha}}}, \\
\widehat{\gamma}
& = \overline{x} - \dfrac{\widehat{\alpha} \widehat{\beta}}{\widehat{\alpha}+1}.

Least squares estimator:

Before introducing the equations, let us present the overall methodology. When \gamma is known, then we solve a linear least squares problem to estimate \beta and \alpha.

When \gamma is unknown, then two problems are involved:

  • in the outer loop, a non-linear least squares problem is solved to estimate \gamma;

  • in the inner loop, for a given value of \gamma, a linear least squares problem is solved to estimate \alpha and \beta.

Let us now introduce the methods in more details. Let \widehat{S} be the empirical survival function. If \gamma is known, then we solve the following linear least-squares problem:

(1)\left(\widehat{a}_0, \widehat{a}_1\right)
  = \argmin_{(a_0, a_1)^\top \in \Rset^2} \sum_{i = 1}^\sampleSize
      \left(\widehat{S}(x_i) - (a_1 \log(x_i - \gamma) + a_0)\right)^2.

To do this, let \vect{y} \in \Rset^{\sampleSize} be the vector equal to the value of the empirical survival function at each observation:

y_i = \widehat{S}(x_i)

for 1 \leq i \leq \sampleSize. Moreover, let \vect{z} \in \Rset^{\sampleSize} be the vector of logarithm of the shifted observations:

z_i = \log(x_i - \gamma)

for 1 \leq i \leq \sampleSize. Then the linear least squares problem is:

\left(\widehat{a}_0, \widehat{a}_1\right)
= \argmin_{(a_0, a_1)^\top \in \Rset^2} \sum_{i = 1}^\sampleSize
    \left(y_i - (a_1 z_i + a_0)\right)^2.

See LinearLeastSquares for more details.

Once the vector \left(\widehat{a}_0, \widehat{a}_1\right) is computed, we compute \alpha and \beta from the equations:

\widehat{\beta} &= \exp \left( \frac{-\widehat{a}_0}{\widehat{a}_1} \right), \\
\widehat{\alpha} &= -\widehat{a}_1.

When \gamma is unknown, it is estimated using non-linear least squares. More precisely, the parameter \gamma is the solution of:

\widehat{\gamma} 
= \argmin_{\gamma} \sum_{i = 1}^\sampleSize
  \left(\widehat{S}(x_i) - (a_1(\gamma) \log(x_i - \gamma) + a_0(\gamma))\right)^2.

where a_0, a_1 are computed from linear least-squares at each optimization evaluation.

Maximum likelihood based estimator:

The log-likelihood of the sample is:

& \ell(\beta, \alpha, \gamma \mid  x_1, \dots, x_{\sampleSize}) \\
& = \sampleSize \log(\alpha) + \sampleSize \alpha \log(\beta)
  - (\alpha + 1) \sum_{i = 1}^\sampleSize \log(x_i - \gamma)

The maximum likelihood based estimator \left(\widehat{\beta}, \widehat{\alpha}, \widehat{\gamma}\right) of \left(\beta, \alpha, \gamma\right) maximizes the log-likelihood:

\left(\widehat{\beta}, \widehat{\alpha}, \widehat{\gamma}\right)
= \argmax_{\beta, \alpha, \gamma} \ell(\beta, \alpha, \gamma \mid  x_1, \dots, x_{\sampleSize})

In the current implementation, all parameters are estimated simultaneously.

However, another method could be used, which can be described as follows. For a given value of \gamma, the log-likelihood of the sample is defined by:

& \ell(\alpha(\gamma), \beta(\gamma) \mid  x_1, \dots, x_{\sampleSize}, \gamma) \\
& = \sampleSize \log(\alpha(\gamma)) + \sampleSize \alpha(\gamma) \log(\beta(\gamma))
  - (\alpha(\gamma) + 1) \sum_{i=1}^\sampleSize \log(x_i - \gamma)

We compute \left(\widehat{\beta}( \gamma), \widehat{\alpha}( \gamma)\right) which maximizes \ell(\beta, \alpha, \mid  x_1, \dots, x_{\sampleSize}, \gamma):

\begin{aligned}
\left(\widehat{\beta}(\gamma), \widehat{\alpha}(\gamma)\right)
& = \argmax_{(\beta, \alpha)^\top \in \Rset^2} & & \ell(\alpha(\gamma), \beta(\gamma) \mid x_1, \dots, x_{\sampleSize}, \gamma) \\
& \quad \text{s.t.} & & \gamma + \widehat{\beta}(\gamma) \leq x_{(1,\sampleSize)}
\end{aligned}

where x_{(1,\sampleSize)} is the smallest observation in the sample:

x_{(1,\sampleSize)} = \min_{1 \leq i \leq \sampleSize} x_i.

We get:

\widehat{\beta}( \gamma) & = x_{(1,\sampleSize)} - \gamma, \\
\widehat{\alpha}( \gamma) & = \dfrac{\sampleSize}{\sum_{i=1}^\sampleSize
    \log\left( \dfrac{x_i - \gamma}{\widehat{\beta}( \gamma)}\right)}.

Then the parameter \gamma is computed by maximizing the log-likelihood of the sample \ell\left(\widehat{\beta}( \gamma), \widehat{\alpha}( \gamma), \gamma\right):

\widehat{\gamma}
= \argmax_{\gamma \in \Rset} \ell\left(\widehat{\beta}( \gamma), \widehat{\alpha}( \gamma), \gamma\right)

The starting point of the optimization algorithm is:

\gamma_0 = x_{(1,\sampleSize)} - \frac{|x_{(1,\sampleSize)}|}{2 + \sampleSize}.

Examples

In the first example, we estimate all the parameters, that is, \beta, \alpha and \gamma.

>>> import openturns as ot
>>> real_distribution = ot.Pareto(2.5, 1.0, 0.0)
>>> sample = real_distribution.getSample(1000)
>>> factory = ot.ParetoFactory()
>>> estimated_distribution_full = factory.build(sample)

In the second example, we assume that the \gamma parameter is known and estimate \beta and \alpha. The user sets the value of \gamma (index 2 in the order \beta, \alpha and \gamma).

>>> known_gamma = 0.0
>>> factory.setKnownParameter([2], [known_gamma])
>>> estimated_distribution_fixed = factory.build(sample)
__init__(*args)
build(*args)

Build the distribution.

Available usages:

build()

build(sample)

build(param)

Parameters:
sample2-d sequence of float

Data.

paramsequence of float

The parameters of the distribution.

Returns:
distDistribution

The estimated distribution.

In the first usage, the default native distribution is built.

buildAsPareto(*args)

Estimate the distribution as native distribution.

Available usages:

buildAsPareto()

buildAsPareto(sample)

buildAsPareto(param)

Parameters:
sample2-d sequence of float

Data.

paramsequence of float,

The parameters of the Pareto.

Returns:
distPareto

The estimated distribution as a Pareto.

In the first usage, the default Pareto distribution is built.

buildEstimator(*args)

Build the distribution and the parameter distribution.

Parameters:
sample2-d sequence of float

Data.

parametersDistributionParameters

Optional, the parametrization.

Returns:
resDistDistributionFactoryResult

The results.

Notes

According to the way the native parameters of the distribution are estimated, the parameters distribution differs:

  • Moments method: the asymptotic parameters distribution is normal and estimated by Bootstrap on the initial data;

  • Maximum likelihood method with a regular model: the asymptotic parameters distribution is normal and its covariance matrix is the inverse Fisher information matrix;

  • Other methods: the asymptotic parameters distribution is estimated by Bootstrap on the initial data and kernel fitting (see KernelSmoothing).

If another set of parameters is specified, the native parameters distribution is first estimated and the new distribution is determined from it:

  • if the native parameters distribution is normal and the transformation regular at the estimated parameters values: the asymptotic parameters distribution is normal and its covariance matrix determined from the inverse Fisher information matrix of the native parameters and the transformation;

  • in the other cases, the asymptotic parameters distribution is estimated by Bootstrap on the initial data and kernel fitting.

buildMethodOfLeastSquares(*args)

Method of least-squares.

Parameters:
sample2-d sequence of float

Data.

gammafloat, optional

Location parameter. If provided, the estimation of \beta and \alpha is performed via linear least squares with \gamma fixed. If not specified, \gamma is first estimated using a non-linear least squares routine, followed by a linear least squares estimation for the remaining parameters.

Returns:
distributionPareto

The estimated distribution.

Examples

In the following example, the parameters of a Pareto are estimated from a sample. We create a simulated sample from a Pareto distribution with parameters beta=2.5, alpha=1.0 and gamma=0.0.

>>> import openturns as ot
>>> real_distribution = ot.Pareto(2.5, 1.0, 0.0)
>>> sample = real_distribution.getSample(1000)
>>> factory = ot.ParetoFactory()

Example 1: When gamma is known. In this case, we estimate the parameters beta and alpha using linear least squares.

>>> known_gamma = 0.0
>>> estimated_distribution_fixed = factory.buildMethodOfLeastSquares(sample, known_gamma)
>>> print(estimated_distribution_fixed.getParameter())
[2.53...,1.03...,0]

Example 2: When gamma is unknown. In this case, we perform a full estimation by non-linear least squares (for gamma) combined with linear least squares (for beta and alpha).

>>> estimated_distribution_full = factory.buildMethodOfLeastSquares(sample)
>>> print(estimated_distribution_full.getParameter())
[2.61...,1.05...,-0.10...]
buildMethodOfLikelihoodMaximization(sample)

Method of likelihood maximization.

Refer to MaximumLikelihoodFactory.

Parameters:
sample2-d sequence of float

Data.

Returns:
distributionPareto

The estimated distribution

Notes

When this method is used, all parameters are estimated simultaneously.

buildMethodOfMoments(sample)

Method of moments estimator.

Parameters:
sample2-d sequence of float

Data.

Returns:
distributionPareto

The estimated distribution

getBootstrapSize()

Accessor to the bootstrap size.

Returns:
sizeint

Size of the bootstrap.

getClassName()

Accessor to the object’s name.

Returns:
class_namestr

The object class name (object.__class__.__name__).

getKnownParameterIndices()

Accessor to the known parameters indices.

Returns:
indicesIndices

Indices of the known parameters.

getKnownParameterValues()

Accessor to the known parameters values.

Returns:
valuesPoint

Values of known parameters.

getName()

Accessor to the object’s name.

Returns:
namestr

The name of the object.

hasName()

Test if the object is named.

Returns:
hasNamebool

True if the name is not empty.

setBootstrapSize(bootstrapSize)

Accessor to the bootstrap size.

Parameters:
sizeint

The size of the bootstrap.

setKnownParameter(*args)

Accessor to the known parameters.

Parameters:
positionssequence of int

Indices of known parameters.

valuessequence of float

Values of known parameters.

Examples

When a subset of the parameter vector is known, the other parameters only have to be estimated from data.

In the following example, we consider a sample and want to fit a Beta distribution. We assume that the a and b parameters are known beforehand. In this case, we set the third parameter (at index 2) to -1 and the fourth parameter (at index 3) to 1.

>>> import openturns as ot
>>> ot.RandomGenerator.SetSeed(0)
>>> distribution = ot.Beta(2.3, 2.2, -1.0, 1.0)
>>> sample = distribution.getSample(10)
>>> factory = ot.BetaFactory()
>>> # set (a,b) out of (r, t, a, b)
>>> factory.setKnownParameter([2, 3], [-1.0, 1.0])
>>> inf_distribution = factory.build(sample)
setName(name)

Accessor to the object’s name.

Parameters:
namestr

The name of the object.

Examples using the class

Get the asymptotic distribution of the estimators

Get the asymptotic distribution of the estimators

Fit a parametric distribution

Fit a parametric distribution