GeneralizedExtremeValueFactory¶

(Source code, svg)

class GeneralizedExtremeValueFactory(*args)¶

GeneralizedExtremeValue factory.

Methods

`build`(*args)	Estimate the distribution via maximum likelihood.
`buildAsGeneralizedExtremeValue`(*args)	Estimate the distribution as native distribution.
`buildCovariates`(*args)	Estimate a GEV from covariates.
`buildEstimator`(*args)	Build the distribution and the parameter distribution.
`buildMethodOfLikelihoodMaximization`(sample)	Estimate the distribution from the $r$ largest order statistics.
`buildMethodOfLikelihoodMaximizationEstimator`(sample)	Estimate the distribution and the parameter distribution with the R-maxima method.
`buildMethodOfXiProfileLikelihood`(sample[, r])	Estimate the distribution with the profile likelihood.
`buildMethodOfXiProfileLikelihoodEstimator`(sample)	Estimate the distribution and the parameter distribution with the profile likelihood.
`buildReturnLevelEstimator`(result, m)	Estimate a return level and its distribution from the GEV parameters.
`buildReturnLevelProfileLikelihood`(sample, m)	Estimate a return level and its distribution with the profile likelihood.
`buildReturnLevelProfileLikelihoodEstimator`(...)	Estimate $(z_m, \sigma, \xi)$ and its distribution with the profile likelihood.
`buildTimeVarying`(*args)	Estimate a non stationary GEV from a time-dependent parametric model.
`getBootstrapSize`()	Accessor to the bootstrap size.
`getClassName`()	Accessor to the object's name.
`getKnownParameterIndices`()	Accessor to the known parameters indices.
`getKnownParameterValues`()	Accessor to the known parameters values.
`getName`()	Accessor to the object's name.
`getOptimizationAlgorithm`()	Accessor to the solver.
`hasName`()	Test if the object is named.
`setBootstrapSize`(bootstrapSize)	Accessor to the bootstrap size.
`setKnownParameter`(*args)	Accessor to the known parameters.
`setName`(name)	Accessor to the object's name.
`setOptimizationAlgorithm`(solver)	Accessor to the solver.

See also

DistributionFactory, GeneralizedExtremeValue, FrechetFactory, GumbelFactory, WeibullMaxFactory

Notes

Several estimators to build a GeneralizedExtremeValueFactory distribution from a scalar sample are proposed. The details are given in the methods documentation.

The following ResourceMap entries can be used to tweak the parameters of the optimization solver involved in the different estimators:

GeneralizedExtremeValueFactory-DefaultOptimizationAlgorithm
GeneralizedExtremeValueFactory-MaximumCallsNumber
GeneralizedExtremeValueFactory-MaximumAbsoluteError
GeneralizedExtremeValueFactory-MaximumRelativeError
GeneralizedExtremeValueFactory-MaximumObjectiveError
GeneralizedExtremeValueFactory-MaximumConstraintError
GeneralizedExtremeValueFactory-InitializationMethod
GeneralizedExtremeValueFactory-NormalizationMethod

__init__(*args)¶

build(*args)¶

Estimate the distribution via maximum likelihood.

Available usages:

build(sample)

build(param)

Parameters:

sample2-d sequence of float: The block maxima sample of dimension 1 from which $\vect{\theta} = (\mu, \sigma, \xi)$ are estimated.
paramsequence of float: The parameters of the GeneralizedExtremeValue.

Returns:

distributionGeneralizedExtremeValue: The estimated distribution.

Notes

The estimation strategy described in buildAsGeneralizedExtremeValue() is followed.

buildAsGeneralizedExtremeValue(*args)¶

Estimate the distribution as native distribution.

Available usages:

buildAsGeneralizedExtremeValue()

buildAsGeneralizedExtremeValue(sample)

buildAsGeneralizedExtremeValue(param)

Parameters:

sample2-d sequence of float: The block maxima sample of dimension 1 from which $\vect{\theta} = (\mu, \sigma, \xi)$ are estimated.
paramsequence of float: The parameters of the GeneralizedExtremeValue.

Returns:

distributionGeneralizedExtremeValue

The estimated distribution as a GeneralizedExtremeValue.

In the first usage, the default GeneralizedExtremeValue distribution is built.

Notes

The estimate maximizes the log-likelihood of the model.

buildCovariates(*args)¶

Estimate a GEV from covariates.

Parameters:

sample2-d sequence of float

The block maxima grouped in a sample of size $m$ and one dimension.

covariates2-d sequence of float

Covariates sample. A constant column is automatically added if it is not provided.

muIndicessequence of int, optional

Indices of covariates considered for parameter $\mu$ .

By default, an empty sequence.

The index of the constant covariate is added if empty or if the covariates do not initially contain a constant column.

sigmaIndicessequence of int, optional

Indices of covariates considered for parameter $\sigma$ .

By default, an empty sequence.

The index of the constant covariate is added if empty or if the covariates do not initially contain a constant column.

xiIndicessequence of int, optional

Indices of covariates considered for parameter $\xi$ .

By default, an empty sequence.

The index of the constant covariate is added if empty or if the covariates do not initially contain a constant column.

muLinkFunction, optional

The $h_{\mu}$ function.

By default, the identity function.

sigmaLinkFunction, optional

The $h_{\sigma}$ function.

By default, the identity function.

xiLinkFunction, optional

The $h_{\xi}$ function.

By default, the identity function.

initializationMethodstr, optional

The initialization method for the optimization problem: Gumbel or Static.

By default, the method Gumbel (see ResourceMap, key GeneralizedExtremeValueFactory-InitializationMethod).

normalizationMethodstr, optional

The data normalization method: CenterReduce, MinMax or None.

By default, the method MinMax (see ResourceMap, key GeneralizedExtremeValueFactory-NormalizationMethod).

Returns:

resultCovariatesResult: The result class.

Notes

Let $Z_{\vect{y}}$ be a GEV model whose parameters depend on $d$ covariates denoted by $\vect{y} = \Tr{(y_1, \dots, y_d)}$ :

$Z_{\vect{y}} \sim \mbox{GEV}(\mu(\vect{y}), \sigma(\vect{y}), \xi(\vect{y}))$

We denote by $(z_{\vect{y}_1}, \dots, z_{\vect{y}_n})$ the values of $Z_{\vect{y}}$ associated to the values of the covariates $(\vect{y}_1, \dots, \vect{y}_n)$ .

For numerical reasons, it is recommended to normalize the covariates. Each covariate $y_k$ has its own normalization:

$\tilde{y}_k = \tau_k(y_k) = \dfrac{y_k-c_k}{d_k}$

and with three ways of defining $(c_k,d_k)$ of the covariate $y_k$ :

the CenterReduce method where $c_k = \dfrac{1}{n} \sum_{i=1}^n y_{k,i}$ is the covariate mean and $d_k = \sqrt{\dfrac{1}{n} \sum_{i=1}^n (y_{k,i}-c_k)^2}$ is the standard deviation of the covariates;
the MinMax method where $c_k = \min_i y_{k,i}$ is the min value of the covariate $y_k$ and $d_k = \max_i y_{k,i}- \min_i y_{k,i}$ its range. This is the default method;
the None method where $c_k = 0$ and $d_k = 1$ : in that case, data are not normalized.

Let $\vect{\theta} = (\mu, \sigma, \xi)$ be the vector of parameters. Then, $\vect{\theta}$ depends on all the $d$ covariates even if each component of $\vect{\theta}$ only depends on a subset of the covariates. We denote by $(y_1^q, \dots, y_{d_q}^q)$ the $d_q$ covariates involved in the modelling of the component $\theta_q$ .

Each component $\theta_q$ can be written as a function of the normalized covariates:

$\theta_q(y_1^q, \dots, y_{d_q}^q) & = h_q\left(\sum_{i=1}^{d_q} \tilde{\beta} _i^q\tilde{y}_i^q \right)$

This relation can be written as a function of the real covariates:

$\theta_q(y_1^q, \dots, y_{d_q}^q) & = h_q\left(\sum_{i=1}^{d_q} \beta_i^qy_i^q + \beta_{d_q+1}^q \right)$

where:

$h_q: \Rset \mapsto \Rset$ is usually referred to as the inverse-link function of the component $\theta_q$ ,
each $\beta_i^{q} \in \Rset$ .

To allow some parameters to remain constant, i.e. independent of the covariates (this will generally be the case for the parameter $\xi$ ), the library systematically adds the constant covariate to the speciﬁed covariates.

The complete vector of parameters is defined by:

$\Tr{\vect{b}} & = \Tr{ ( \Tr{\vect{b}_1}, \dots, \Tr{\vect{b}_p} ) } \in \Rset^{d_t}\\ \Tr{\vect{b}_q} & = (\beta_1^q, \dots, \beta_{d_q}^q) \in \Rset^{d_q}$

where $d_t = \sum_{q=1}^p d_q$ .

The estimator of $\vect{\beta}$ maximizes the likelihood of the model which is defined by:

$L(\vect{\beta}) = \prod_{i=1}^{n} g(z_{\vect{y}_i};\vect{\theta}(\vect{y}_i)))$

where $g(z_{\vect{y}_i};\vect{\theta}(\vect{y}_i))$ denotes the GEV density function with parameters $\vect{\theta}(\vect{y}_i)$ and evaluated at $z_{\vect{y}_i}$ .

Then, if none of the $\xi(\vect{y}_i)$ is zero, the log-likelihood is defined by:

$\ell (\vect{\beta}) = -\sum_{i=1}^{n} \left\{ \log(\sigma(\vect{y}_i)) + (1 + 1 / \xi(\vect{y} _i) ) \log\left[ 1+\xi(\vect{y}_i) \left( \frac{z_{\vect{y}_i} - \mu(\vect{y}_i)} {\sigma(\vect{y}_i)}\right) \right] + \left[ 1 + \xi(\vect{y}_i) \left( \frac{ z_{\vect{y}_i}- \mu(\vect{y}_i)}{\sigma(\vect{y}_i)} \right) \right]^{-1 / \xi(\vect{y}_i)} \right\}$

defined on $(\mu, \sigma, \xi)$ such that $1+\xi \left( \frac{z_{\vect{y}_i} - \mu(\vect{y}_i)}{\sigma(\vect{y}_i)} \right) > 0$ for all $\vect{y}_i$ .

And if any of the $\xi(\vect{y}_i)$ is equal to 0, the log-likelihood is defined as:

$\ell (\vect{\beta}) = -\sum_{i=1}^{n} \left\{ \log(\sigma(\vect{y}_i)) + \frac{z_{\vect{y}_i} - \mu(\vect{y}_i)}{\sigma(\vect{y}_i)} + \exp \left\{ - \frac{z_{\vect{y}_i} - \mu(\vect{y}_i)}{\sigma(\vect{y}_i)} \right\} \right\}$

The initialization of the optimization problem is crucial. Two initial points $(\mu_0, \sigma_0, \xi_0)$ are proposed:

the Gumbel initial point: in that case, we assume that the GEV is a stationary Gumbel distribution and we deduce $(\mu_0, \sigma_0)$ from the mean $\hat{M}$ and standard variation $\hat{\sigma}$ of the data: $\sigma_0 = \dfrac{\sqrt{6}}{\pi} \hat{\sigma}$ and $\mu_0 = \hat{M} - \gamma \sigma_0$ where $\gamma$ is Euler’s constant; then we take the initial point $(\mu_0, \sigma_0, \xi_0 = 0.1)$ . This is the default initial point;
the Static initial point: in that case, we assume that the GEV is stationary and $(\mu_0, \sigma_0, \xi_0)$ is the maximum likelihood estimate resulting from that assumption.

The result class provides:

the estimator $\hat{\vect{\beta}}$ ,
the asymptotic distribution of $\hat{\vect{\beta}}$ ,
the parameter function $(\vect{\beta}, \vect{y}) \mapsto \vect{\theta}(\vect{\beta}, \vect{y})$ ,
the graphs of the parameter functions $y_k \mapsto \theta_q(\vect{y})$ , where all the components of $\vect{y}$ are fixed to a reference value excepted for $y_k$ , for each $k$ ,
the graphs of the parameter functions $(y_k, y_\ell) \mapsto\theta_q(\vect{y})$ , where all the components of $\vect{y}$ are fixed to a reference value excepted for $(y_k, y_\ell)$ , for each $(k,\ell)$ ,
the normalizing function $\vect{y} \mapsto (\tau_1(y_1), \dots, \tau_d(y_d))$ ,
the optimal log-likelihood value $\hat{\vect{\beta}}$ ,
the GEV distribution at covariate $\vect{y}$ ,
the graphs of the quantile functions of order $p$ : $y_k \mapsto q_p(Z_{\vect{y}})$ where all the components of $\vect{y}$ are fixed to a reference value excepted for $y_k$ , for each $k$ ,
the graphs of the quantile functions of order $p$ : $(y_k, y_\ell) \mapsto q_p(Z_{\vect{y}})$ where all the components of $\vect{y}$ are fixed to a reference value excepted for $(y_k, y_\ell)$ , for each $(k,\ell)$ .

buildEstimator(*args)¶

Build the distribution and the parameter distribution.

Parameters:

sample2-d sequence of float: Data.
parametersDistributionParameters: Optional, the parametrization.

Returns:

resDistDistributionFactoryResult: The results.

Notes

According to the way the native parameters of the distribution are estimated, the parameters distribution differs:

Moments method: the asymptotic parameters distribution is normal and estimated by Bootstrap on the initial data;

Maximum likelihood method with a regular model: the asymptotic parameters distribution is normal and its covariance matrix is the inverse Fisher information matrix;

Other methods: the asymptotic parameters distribution is estimated by Bootstrap on the initial data and kernel fitting (see KernelSmoothing).

If another set of parameters is specified, the native parameters distribution is first estimated and the new distribution is determined from it:

if the native parameters distribution is normal and the transformation regular at the estimated parameters values: the asymptotic parameters distribution is normal and its covariance matrix determined from the inverse Fisher information matrix of the native parameters and the transformation;

in the other cases, the asymptotic parameters distribution is estimated by Bootstrap on the initial data and kernel fitting.

buildMethodOfLikelihoodMaximization(sample, r=0)¶

Estimate the distribution from the $r$ largest order statistics.

Parameters:

sample2-d sequence of float

Block maxima grouped in a sample of size $n$ and dimension $R$ .

rint, $1 \leq r \leq R$ ,

Number of largest order statistics taken into account among the $R$ stored ones.

By default, $r=0$ which means that all the maxima are used.

Returns:

distributionGeneralizedExtremeValue: The estimated distribution.

Notes

The method estimates a GEV distribution parameterized by $\vect{\theta} = (\mu, \sigma, \xi)$ from a given sample.

Let us suppose we have a series of independent and identically distributed variables and that data are grouped into $n$ blocks. In each block, the largest $R$ observations are recorded.

We define the series $M_i^{(R)} = (z_i^{(1)}, \hdots, z_i^{(R)})$ for $1 \leq i \leq n$ where the values are sorted in decreasing order.

The estimator of $(\mu, \sigma, \xi)$ maximizes the log-likelihood built from the $r$ largest order statistics, with $1 \leq r \leq R$ defined as:

If $\xi \neq 0$ , then:

(1)¶ $\ell(\mu, \sigma, \xi) = -nr \log \sigma - \sum_{i=1}^n \biggl[ 1 + \xi \Bigl( \frac{z_i^{(r)} - \mu }{\sigma} \Bigr) \biggr]^{-1/\xi} -\left(\dfrac{1}{\xi} +1 \right) \sum_{i=1}^n \sum_{k=1}^r \log \biggl[ 1 + \xi \Bigl( \frac{z_i^{(k)} - \mu }{\sigma} \Bigr) \biggr]$

defined on $(\mu, \sigma, \xi)$ such that $1+\xi \left( \frac{z_i^{(k)} - \mu}{\sigma} \right) > 0$ for all $1 \leq i \leq m$ and $1 \leq k \leq r$ .

If $\xi = 0$ , then:

(2)¶ $\ell(\mu, \sigma, \xi) = -nr \log \sigma - \sum_{i=1}^n \exp \biggl[ - \Bigl( \frac{z_i^{(r)} - \mu }{\sigma} \Bigr) \biggr] - \sum_{i=1}^n \sum_{k=1}^r \Bigl( \frac{z_i^{(k)} - \mu }{\sigma} \Bigr)$

buildMethodOfLikelihoodMaximizationEstimator(sample, r=0)¶

Estimate the distribution and the parameter distribution with the R-maxima method.

Parameters:

sampleM2-d sequence of float

Block maxima grouped in a sample of size $n$ and dimension $R$ .

rint, $1 \leq r \leq R$ , optional

Number of order statistics taken into account among the $R$ stored ones.

By default, $r=0$ which means that all the maxima are used.

Returns:

resultDistributionFactoryLikelihoodResult: The result class.

Notes

The method estimates a GEV distribution parameterized by $\vect{\theta} = (\mu, \sigma, \xi)$ from a given sample.

The estimator $\hat{\vect{\theta}}$ is defined using the profile log-likelihood as detailed in buildMethodOfLikelihoodMaximization().

The result class produced by the method provides:

the GEV distribution associated to $(\hat{\mu}, \hat{\sigma}, \hat{\xi})$ ,
the asymptotic distribution of $(\hat{\mu}, \hat{\sigma}, \hat{\xi})$ .

buildMethodOfXiProfileLikelihood(sample, r=0)¶

Estimate the distribution with the profile likelihood.

Parameters:

sample2-d sequence of float

Block maxima grouped in a sample of size $n$ and dimension $R$ .

rint, $1 \leq r \leq R$ ,

Number of largest order statistics taken into account among the $R$ stored ones.

By default, $r=0$ which means that all the maxima are used.

Returns:

distributionGeneralizedExtremeValue: The estimated distribution.

Notes

The method estimates a GEV distribution parameterized by $\vect{\theta} = (\mu, \sigma, \xi)$ from a given sample.

The estimator $\hat{\vect{\theta}}$ is defined using a nested numerical optimization of the log-likelihood:

$\ell_p (\xi) = \max_{(\mu, \sigma)} \ell (\mu, \sigma, \xi)$

where $\ell (\mu, \sigma, \xi)$ is detailed in equations (1) and (2) with $r=1$ .

If $\xi = 0$ then:

$\hat{\xi} & = \argmax_{\xi} \ell_p(\xi)\\ (\hat{\mu}, \hat{\sigma}) & = \argmax_{(\mu, \sigma)} \ell(\mu, \sigma, \hat{\xi})$

The starting point of the optimization is initialized from the probability weighted moments method, see [diebolt2008].

buildMethodOfXiProfileLikelihoodEstimator(sample, r=0)¶

Estimate the distribution and the parameter distribution with the profile likelihood.

Parameters:

sample2-d sequence of float

Block maxima grouped in a sample of size $n$ and dimension $R$ .

rint, $1 \leq r \leq R$ ,

Number of largest order statistics taken into account among the $R$ stored ones. The block maxima sample of dimension 1 from which $\vect{\theta} = (\mu, \sigma, \xi)$ are estimated.

By default, $r=0$ which means that all the maxima are used.

Returns:

resultProfileLikelihoodResult: The result class.

Notes

The method estimates a GEV distribution parameterized by $\vect{\theta} = (\mu, \sigma, \xi)$ from a given sample.

The estimator $\hat{\vect{\theta}}$ is defined in buildMethodOfXiProfileLikelihood().

The result class produced by the method provides:

the GEV distribution associated to $(\hat{\mu}, \hat{\sigma}, \hat{\xi})$ ,
the asymptotic distribution of $(\hat{\mu}, \hat{\sigma}, \hat{\xi})$ ,
the profile log-likelihood function $\xi \mapsto \ell_p(\xi)$ ,
the optimal profile log-likelihood value $\ell_p(\hat{\xi})$ ,
confidence intervals of level $(1-\alpha)$ of $\xi$ .

buildReturnLevelEstimator(result, m)¶

Estimate a return level and its distribution from the GEV parameters.

Parameters:

resultDistributionFactoryResult: Likelihood estimation result of a GeneralizedExtremeValue
mfloat: The return period expressed in terms of number of blocks.

Returns:

distributionDistribution: The asymptotic distribution of $\hat{z}_m$ .

Notes

Let $Z$ be a random variable which follows a GEV distribution parameterized by $\vect{\theta} = (\mu, \sigma, \xi)$ .

The $m$ -block return level $z_m$ is the level exceeded on average once every $m$ blocks. The $m$ -block return level can be translated into the annual-scale: if there are $n_y$ blocks per year, then the $N$ -year return level corresponds to the $m$ -bock return level where $m = n_yN$ .

The $m$ -block return level is defined as the quantile of order $1-p=1-1/m$ of the GEV distribution:

If $\xi \neq 0$ :

(3)¶ $z_m = \mu - \frac{\sigma}{\xi} \left[ 1- (-\log(1-p))^{-\xi}\right]$

If $\xi = 0$ :

(4)¶ $z_m = \mu - \sigma \log(-\log(1-p))$

The estimator $\hat{z}_m$ of $z_m$ is deduced from the estimator $(\hat{\mu}, \hat{\sigma}, \hat{\xi})$ of $(\mu, \sigma, \xi)$ .

The asymptotic distribution of $\hat{z_m}$ is obtained by the Delta method from the asymptotic distribution of $(\hat{\mu}, \hat{\sigma}, \hat{\xi})$ . It is a normal distribution with mean $\hat{z}_m$ and variance:

$\Var{z_m} = (\nabla z_m)^T \mat{V}_n \nabla z_m$

where $\nabla z_m = (\frac{\partial z_m}{\partial \mu}, \frac{\partial z_m}{\partial \sigma}, \frac{\partial z_m}{\partial \xi})$ and $\mat{V}_n$ is the asymptotic covariance of $(\hat{\mu}, \hat{\sigma}, \hat{\xi})$ .

buildReturnLevelProfileLikelihood(sample, m)¶

Estimate a return level and its distribution with the profile likelihood.

Parameters:

sample2-d sequence of float: The block maxima sample of dimension 1.

Returns:

distributionNormal: The asymptotic distribution of $\hat{z}_m$ .

Notes

Let $Z$ be a random variable which follows a GEV distribution parameterized by $\vect{\theta} = (\mu, \sigma, \xi)$ .

The $m$ -return level $z_m$ is defined in buildReturnLevelEstimator().

The estimator is defined using a nested numerical optimization of the log-likelihood:

$\ell_p (z_m) = \max_{(\mu, \sigma)} \ell (z_m, \sigma, \xi)$

where $\ell (z_m, \sigma, \xi)$ is the log-likelihood detailed in (1) and (2) with $r=1$ and where we substitued $\mu$ for $z_m$ using equations (3) or (4).

The estimator $\hat{z}_m$ of $z_m$ is defined by:

$\hat{z}_m = \argmax_{z_m} \ell_p(z_m)$

The asymptotic distribution of $\hat{z}_m$ is normal.

The starting point of the optimization is initialized from the regular maximum likelihood method.

buildReturnLevelProfileLikelihoodEstimator(sample, m)¶

Estimate $(z_m, \sigma, \xi)$ and its distribution with the profile likelihood.

Parameters:

sample2-d sequence of float: The block maxima sample of dimension 1.
mfloat: The return period expressed in terms of number of blocks.

Returns:

resultProfileLikelihoodResult: The result class.

Notes

Let $Z$ be a random variable which follows a GEV distribution parameterized by $\vect{\theta} = (\mu, \sigma, \xi)$ .

The $m$ -block return level $z_m$ is defined in buildReturnLevelEstimator(). The profile log-likelihood $\ell_p(z_m)$ is defined in buildReturnLevelProfileLikelihood().

The estimator of $(\hat{z}_m, \hat{\sigma}, \hat{\xi})$ is defined by:

$\hat{z}_m & = \argmax_{z_m} \ell_p(z_m)\\ (\hat{\sigma}, \hat{\xi}) & = \argmax_{(\sigma, \xi)} \ell(\hat{z}_m, \sigma, \xi)$

The result class produced by the method provides:

the GEV distribution associated to $(\hat{z}_m, \hat{\sigma}, \hat{\xi})$ ,
the asymptotic distribution of $(\hat{z}_m, \hat{\sigma}, \hat{\xi})$ ,
the profile log-likelihood function $z_m \mapsto \ell_p(z_m)$ ,
the optimal profile log-likelihood value $\ell_p(\hat{z}_m)$ ,
confidence intervals of level $(1-\alpha)$ of $\hat{z}_m$ .

buildTimeVarying(*args)¶

Estimate a non stationary GEV from a time-dependent parametric model.

Parameters:

sample2-d sequence of float

The block maxima grouped in a sample of size $m$ and one dimension.

timeStamps2-d sequence of float

Values of $t$ .

basisBasis

Functional basis respectively for $\mu(t)$ , $\sigma(t)$ and $\xi(t)$ .

muIndicessequence of int, optional

Indices of basis terms considered for parameter $\mu$

sigmaIndicessequence of int, optional

Indices of basis terms considered for parameter $\sigma$

xiIndicessequence of int, optional

Indices of basis terms considered for parameter $\xi$

muLinkFunction, optional

The $h_{\mu}$ function.

By default, the identity function.

sigmaLinkFunction, optional

The $h_{\sigma}$ function.

By default, the identity function.

xiLinkFunction, optional

The $h_{\xi}$ function.

By default, the identity function.

initializationMethodstr, optional

The initialization method for the optimization problem: Gumbel or Static.

By default, the method Gumbel (see ResourceMap, key GeneralizedExtremeValueFactory-InitializationMethod).

normalizationMethodstr, optional

The data normalization method: CenterReduce, MinMax or None.

By default, the method MinMax (see ResourceMap, key GeneralizedExtremeValueFactory-NormalizationMethod).

Returns:

resultTimeVaryingResult: The result class.

Notes

Let $Z_t$ be a non stationary GEV distribution:

$Z_t \sim \mbox{GEV}(\mu(t), \sigma(t), \xi(t))$

We denote by $(z_{t_1}, \dots, z_{t_n})$ the values of $Z_t$ on the time stamps $(t_1, \dots, t_n)$ .

For numerical reasons, it is recommended to normalize the time stamps. The following mapping is applied:

$\tau(t) = \dfrac{t-c}{d}$

and with three ways of defining $(c,d)$ :

the CenterReduce method where $c = \dfrac{1}{n} \sum_{i=1}^n t_i$ is the mean time stamps and $d = \sqrt{\dfrac{1}{n} \sum_{i=1}^n (t_i-c)^2}$ is the standard deviation of the time stamps;
the MinMax method where $c = t_1$ is the first time and $d = t_n-t_1$ the range of the time stamps. This is the default method;
the None method where $c = 0$ and $d = 1$ : in that case, data are not normalized.

If we denote by $\theta_q$ is a component of $\vect{\theta} = (\mu, \sigma, \xi)$ , then $\theta_q$ can be written as a function of $t$ :

$\theta_q(t) = h_q\left(\sum_{i=1}^{d_{\theta_q}} \beta_i^{\theta_q} \varphi_i^{\theta_q} (\tau(t))\right)$

where:

$d_{\theta_q}$ is the size of the functional basis involved in the modelling of $\theta_q$ ,
$h_q: \Rset \mapsto \Rset$ is usually referred to as the inverse-link function of the parameter $\theta_q$ ,
each $\varphi_i^{\theta_q}$ is a scalar function $\Rset \mapsto \Rset$ ,
each $\beta_i^{j} \in \Rset$ .

We denote by $d_{\mu}$ , $d_{\sigma}$ and $d_{\xi}$ the size of the functional basis of $\mu$ , $\sigma$ and $\xi$ respectively. We denote by $\vect{\beta} = (\beta_1^{\mu}, \dots, \beta_{d_{\mu}}^{\mu}, \beta_1^{\sigma}, \dots, \beta_{d_{\sigma}}^{\sigma}, \beta_1^{\xi}, \dots, \beta_{d_{\xi}}^{\xi})$ the complete vector of parameters.

The estimator of $\vect{\beta}$ maximizes the likelihood of the non stationary model which is defined by:

$L(\vect{\beta}) = \prod_{i=1}^{n} g(z_{t_i};\mu(t_i), \sigma(t_i), \xi(t_i))$

where $g(z_{t};\mu(t), \sigma(t), \xi(t))$ denotes the GEV density function with parameters $(\mu(t), \sigma(t), \xi(t))$ evaluated at $z_t$ .

Then, if none of the $\xi(t_i)$ is zero, the log-likelihood is defined by:

$\ell (\vect{\beta}) = -\sum_{i=1}^{n} \left\{ \log(\sigma(t_i)) + (1 + 1 / \xi(t_i) ) \log\left[ 1+\xi(t_i) \left( \frac{z_{t_i} - \mu(t_i)}{\sigma(t_i)}\right) \right] + \left[ 1 + \xi(t_i) \left( \frac{z_{t_i}- \mu(t_i)}{\sigma(t_i)} \right) \right]^{-1 / \xi(t_i)} \right\}$

defined on $(\mu, \sigma, \xi)$ such that $1+\xi(t) \left( \frac{z_t - \mu(t)}{\sigma(t)} \right) > 0$ for all $t$ .

And if any of the $\xi(t)$ is equal to 0, the log-likelihood is defined as:

$\ell (\vect{\beta}) = -\sum_{t=1}^{n} \left\{ \log(\sigma(t)) + \frac{z_t - \mu(t)}{\sigma(t)} + \exp \left\{ - \frac{z_t - \mu(t)}{\sigma(t)} \right\} \right\}$

The initialization of the optimization problem is crucial. Two initial points $(\mu_0, \sigma_0, \xi_0)$ are proposed:

the Gumbel initial point: in that case, we assume that the GEV is a stationary Gumbel distribution and we deduce $(\mu_0, \sigma_0)$ from the empirical mean $\hat{M}$ and the empirical standard variation $\hat{\sigma}$ of the data: $\sigma_0 = \dfrac{\sqrt{6}}{\pi} \hat{\sigma}$ and $\mu_0 = \hat{M} - \gamma \sigma_0$ where $\gamma$ is Euler’s constant; then we take the initial point $(\mu_0, \sigma_0, \xi_0 = 0.1)$ . This is the default initial point;
the Static initial point: in that case, we assume that the GEV is stationary and $(\mu_0, \sigma_0, \xi_0)$ is the maximum likelihood estimate resulting from that assumption.

The result class produced by the method provides:

the estimator $\hat{\vect{\beta}}$ ,
the asymptotic distribution of $\hat{\vect{\beta}}$ ,
the parameter functions $t \mapsto \vect{\theta}(t)$ ,
the normalizing function $t \mapsto \tau(t)$ ,
the optimal log-likelihood value $\hat{\vect{\beta}}$ ,
the GEV distribution at time $t$ ,
the quantile functions of order $p$ : $t \mapsto q_p(Z_t)$ .

getBootstrapSize()¶

Accessor to the bootstrap size.

Returns:

sizeint: Size of the bootstrap.

getClassName()¶

Accessor to the object’s name.

Returns:

class_namestr: The object class name (object.__class__.__name__).

getKnownParameterIndices()¶

Accessor to the known parameters indices.

Returns:

indicesIndices: Indices of the known parameters.

getKnownParameterValues()¶

Accessor to the known parameters values.

Returns:

valuesPoint: Values of known parameters.

getName()¶

Accessor to the object’s name.

Returns:

namestr: The name of the object.

getOptimizationAlgorithm()¶

Accessor to the solver.

Returns:

solverOptimizationAlgorithm: The solver used for numerical optimization of the moments.

hasName()¶

Test if the object is named.

Returns:

hasNamebool: True if the name is not empty.

setBootstrapSize(bootstrapSize)¶

Accessor to the bootstrap size.

Parameters:

sizeint: The size of the bootstrap.

setKnownParameter(*args)¶

Accessor to the known parameters.

Parameters:

positionssequence of int: Indices of known parameters.
valuessequence of float: Values of known parameters.

Examples

When a subset of the parameter vector is known, the other parameters only have to be estimated from data.

In the following example, we consider a sample and want to fit a Beta distribution. We assume that the $a$ and $b$ parameters are known beforehand. In this case, we set the third parameter (at index 2) to -1 and the fourth parameter (at index 3) to 1.

>>> import openturns as ot
>>> ot.RandomGenerator.SetSeed(0)
>>> distribution = ot.Beta(2.3, 2.2, -1.0, 1.0)
>>> sample = distribution.getSample(10)
>>> factory = ot.BetaFactory()
>>> # set (a,b) out of (r, t, a, b)
>>> factory.setKnownParameter([2, 3], [-1.0, 1.0])
>>> inf_distribution = factory.build(sample)