Note

Click here to download the full example code

Fit an extreme value distribution¶

from __future__ import print_function
import openturns as ot
import openturns.viewer as viewer
from matplotlib import pylab as plt

ot.Log.Show(ot.Log.NONE)

Set the random generator seed

ot.RandomGenerator.SetSeed(0)

The generalized extreme value distribution (GEV)¶

The GeneralizedExtremeValue distribution is a family of continuous probability distributions that combine the Gumbel, Frechet and WeibullMax distribution, all extreme value distribution.

In this example we use the associated GeneralizedExtremeValueFactory to fit sample with extreme values. This factory returns the best model among the Frechet, Gumbel and Weibull estimates according to the BIC criterion.

For experiment purpose we draw a sample from a Gumbel of parameters $\beta = 1.0$ and $\gamma = 3.0$ and another one from a Frechet with parameters $\beta=1.0, \alpha = 1.0$ and $\gamma = 0.0$ . We consider both samples as a unique sample from an unknown extreme distribution to be fitted.

The distributions used :

myGumbel = ot.Gumbel(1.0, 3.0)
myFrechet = ot.Frechet(1.0, 1.0, 0.0)

We build our experiment sample of size 2000.

sample = ot.Sample()
sampleFrechet = myFrechet.getSample(1000)
sampleGumbel = myGumbel.getSample(1000)
sample.add(sampleFrechet)
sample.add(sampleGumbel)

We fit the sample thanks to the GeneralizedExtremeValueFactory :

myDistribution = ot.GeneralizedExtremeValueFactory().buildAsGeneralizedExtremeValue(
    sample
)

We can display the parameters of the fitted distribution myDistribution :

print(myDistribution)

Out:

GeneralizedExtremeValue(mu=1.79565, sigma=1.54463, xi=0.546359)

We can also get the actual distribution (Weibull, Frechet or Gumbel) with the getActualDistribution method :

print(myDistribution.getActualDistribution())

Out:

Frechet(beta = 2.82713, alpha = 1.8303, gamma = -1.03148)

The given sample is then best described by a WeibullMax distribution.

We draw the fitted distribution and an histogram of the data.

graph = myDistribution.drawPDF()
graph.add(ot.HistogramFactory().build(sample).drawPDF())
graph.setColors(["black", "red"])
graph.setLegends(["GEV fitting", "histogram"])
graph.setLegendPosition("topright")

view = viewer.View(graph)
axes = view.getAxes()
_ = axes[0].set_xlim(-20.0, 20.0)

For educational purpose we compare different fitting strategies for this sample :

we use the histogram from the data (in red)
the GEV fitted distribution (in black)
the pure Frechet fitted distribution (in green)
the pure Gumbel fitted distribution (in blue)
the pure WeibullMax fitted distribution (in dashed orange)

graph = myDistribution.drawPDF()
graph.add(ot.HistogramFactory().build(sample).drawPDF())

distFrechet = ot.FrechetFactory().buildAsFrechet(sample)
graph.add(distFrechet.drawPDF())

distGumbel = ot.GumbelFactory().buildAsGumbel(sample)
graph.add(distGumbel.drawPDF())

# We change the line style of the WeibullMax because it is expected to be the same
# as the GEV one.
distWeibullMax = ot.WeibullMaxFactory().buildAsWeibullMax(sample)
curveWeibullMax = distWeibullMax.drawPDF().getDrawable(0)
curveWeibullMax.setLineStyle("dashed")
graph.add(curveWeibullMax)

graph.setColors(["black", "red", "green", "blue", "orange"])
graph.setLegends(
    [
        "GEV fitting",
        "histogram",
        "Frechet fitting",
        "Gumbel fitting",
        "WeibullMax fitting",
    ]
)
graph.setLegendPosition("topright")
view = viewer.View(graph)
axes = view.getAxes()  # axes is a matplotlib object
_ = axes[0].set_xlim(-20.0, 20.0)

As returned by the getActualDistribution method the GEV distribution is a WeibullMax. The GeneralizedExtremeValueFactory class is a convenient class to fit extreme valued samples without an a priori knowledge of the underlying (at least the closest) extreme distribution.

The Generalized Pareto Distribution (GPD)¶

In this paragraph we turn to the fitting of a GeneralizedPareto distribution. Various estimators are defined in OpenTurns for the GPD factory. Please refer to the GeneralizedParetoFactory class documentation for more information. The selection is based on the sample size and compared to the GeneralizedParetoFactory-SmallSize key of the ResourceMap :

print(ot.ResourceMap.GetAsUnsignedInteger("GeneralizedParetoFactory-SmallSize"))

Out:

The small sample case¶

In this case the default estimator is based on a probability weighted method of moments, with a fallback on the exponential regression method.

myDist = ot.GeneralizedPareto(1.0, 0.0, 0.0)
N = 17
sample = myDist.getSample(N)

We build our experiment sample of size N

myFittedDist = ot.GeneralizedParetoFactory().buildAsGeneralizedPareto(sample)
print(myFittedDist)

Out:

GeneralizedPareto(sigma = 0.678732, xi=0.0289962, u=0.0498077)

We draw the fitted distribution as well as an histogram to visualize the fitting :

graph = myFittedDist.drawPDF()
graph.add(ot.HistogramFactory().build(sample).drawPDF())
graph.setTitle("Generalized Pareto distribution fitting on a sample")
graph.setColors(["black", "red"])
graph.setLegends(["GPD fitting", "histogram"])
graph.setLegendPosition("topright")

view = viewer.View(graph)
axes = view.getAxes()
_ = axes[0].set_xlim(-1.0, 10.0)

Generalized Pareto distribution fitting on a sample

Large sample¶

For a larger sample the estimator is based on an exponential regression method with a fallback on the probability weighted moments estimator.

N = 1000
sample = myDist.getSample(N)

We build our experiment sample of size N

myFittedDist = ot.GeneralizedParetoFactory().buildAsGeneralizedPareto(sample)
print(myFittedDist)

Out:

GeneralizedPareto(sigma = 0.971553, xi=-0.000639593, u=0.000103683)

We draw the fitted distribution as well as an histogram to visualize the fitting :

graph = myFittedDist.drawPDF()
graph.add(ot.HistogramFactory().build(sample).drawPDF())
graph.setTitle("Generalized Pareto distribution fitting on a sample")
graph.setColors(["black", "red"])
graph.setLegends(["GPD fitting", "histogram"])
graph.setLegendPosition("topright")

view = viewer.View(graph)
axes = view.getAxes()
_ = axes[0].set_xlim(-1.0, 10.0)


plt.show()

Total running time of the script: ( 0 minutes 0.388 seconds)

Gallery generated by Sphinx-Gallery

OpenTURNS

An Open source initiative for the Treatment of Uncertainties, Risks'N Statistics

Table of Contents

Previous topic

Next topic

This Page

Fit an extreme value distribution¶

The generalized extreme value distribution (GEV)¶

The Generalized Pareto Distribution (GPD)¶

The small sample case¶

Large sample¶