Note
Go to the end to download the full example code.
Create a mixture of distributions¶
Introduction¶
In this example, we show how to build a distribution which is a Mixture from a collection of distributions of the same dimension . We denote by the random vector with such a distribution.
Let a collection of distributions and a collection of positive weights such that . Let be a random vector whose distribution is .
In the case where all the have a probability density function , then the mixture has a probability density function defined by:
In the case where all the are discrete, then the mixture is discrete and its probability distribution function is defined by:
In the case where some of the have a probability density function and the other ones are discrete, then the mixture is not discrete and does not have a probability density function. Its cumulated distribution function is defined by:
We illustrate the following particular cases:
Case 1a: Mixture of continuous distributions,
Case 1b: Mixture of copulas,
Case 1c: Mixture of a Histogram and a Generalized Pareto Distribution,
Case 2: Mixture of discrete distributions,
Case 3: Mixture of discrete and continuous distributions.
import openturns as ot
import openturns.viewer as otv
Case 1a: Mixture of continuous distributions¶
In this case, we build the mixture of the following continuous distributions:
a
Triangular
,a
Normal
,a
Uniform
.
The weigths are automatically normalized.
We define the collection of distributions and the associated weights.
distributions = [
ot.Triangular(1.0, 2.0, 4.0),
ot.Normal(-1.0, 1.0),
ot.Uniform(5.0, 6.0),
]
weights = [0.4, 1.0, 0.2]
We create the mixture.
distribution = ot.Mixture(distributions, weights)
print(distribution)
Mixture((w = 0.25, d = Triangular(a = 1, m = 2, b = 4)), (w = 0.625, d = Normal(mu = -1, sigma = 1)), (w = 0.125, d = Uniform(a = 5, b = 6)))
We can draw the probability density function.
graph = distribution.drawPDF()
graph.setTitle("Mixture of Triangular, Normal, Uniform")
graph.setXTitle("x")
graph.setLegendPosition("")
view = otv.View(graph)
We can draw the cumulated distribution function.
graph = distribution.drawCDF()
graph.setTitle("Mixture of Triangular, Normal, Uniform")
graph.setXTitle("x")
graph.setLegendPosition("")
view = otv.View(graph)
Case 1b: Mixture of copulas¶
In this case, we build the mixture of the following copulas:
a
GumbelCopula
,
We define the collection of copulas and the associated weights.
copulas = [ot.GumbelCopula(4.5), ot.ClaytonCopula(2.3)]
weights = [0.2, 0.8]
We create a mixture of copulas.
distribution = ot.Mixture(copulas, weights)
print(distribution)
Mixture((w = 0.2, d = GumbelCopula(theta = 4.5)), (w = 0.8, d = ClaytonCopula(theta = 2.3)))
We can draw the probability density function.
graph = distribution.drawPDF()
graph.setTitle("Mixture of Gumbel copula, Clayton copula")
graph.setXTitle(r"$x_0$")
graph.setYTitle(r"$x_1$")
graph.setLegendPosition("")
view = otv.View(graph)
We can draw the cumulated distribution function.
graph = distribution.drawCDF()
graph.setTitle("Mixture of Gumbel copula, Clayton copula")
graph.setXTitle(r"$x_0$")
graph.setYTitle(r"$x_1$")
view = otv.View(graph)
Case 1c: Mixture of a Histogram and a Generalized Pareto Distribution¶
We want to create the scalar distribution of such that:
where:
is a Histogram,
is a Generalized Pareto distribution (GPD),
is a quantile of high level of .
Let us define:
We assume that we only have a sample from .
In this example, we consider a Normal distribution with zero mean and unit variance. We generate a sample of size .
n = 5000
X_dist = ot.Normal()
sample_X = X_dist.getSample(n)
We build the whole histogram from the sample.
hist_dist = ot.HistogramFactory().build(sample_X)
g_hist = hist_dist.drawPDF()
g_hist.setTitle(r"Empirical distribution of $X$")
g_hist.setXTitle("x")
g_hist.setYTitle("pdf")
g_hist.setLegends(["histogram"])
view = otv.View(g_hist)
We estimate the extreme empirical quantile of level 0.95.
w = 0.95
x0 = hist_dist.computeQuantile(w)[0]
We start by truncating the initial histogram on the interval . We visualize it!
hist_trunc = ot.TruncatedDistribution(hist_dist, x0, ot.TruncatedDistribution.UPPER)
g_hist_trunc = hist_trunc.drawPDF()
g_hist_trunc.setTitle(r"Empirical distribution of $X|X \leq $" + "%.3g" % (x0))
g_hist_trunc.setXTitle("x")
g_hist_trunc.setYTitle("pdf")
g_hist_trunc.setLegends(["truncated histogram"])
view = otv.View(g_hist_trunc)
Then we model by a Generalized Pareto distribution (GPD). We start by extracting from the sample all the values greater than to build the upper sample. We get about points.
sample_X_upper = ot.Sample(0, 1)
for i in range(len(sample_X)):
if sample_X[i, 0] > x0:
sample_X_upper.add(sample_X[i])
print("Excess number = ", sample_X_upper.getSize())
print("n(1-w) = ", int(n * (1 - w)))
Excess number = 247
n(1-w) = 250
Then we fit a GPD parameterized by : the threshold is fixed to . We use the estimator that maximizes the likelihood. To solve the optimisation problem faster, we start by estimating the 3 parameters from the upper sample. Then we fix the threshold to and we estimate the remaining parameters using the previous values of as a starting point to the optimization problem. We visualize the pdf of the GPD.
gpd_first = ot.GeneralizedParetoFactory().build(sample_X_upper)
mlFact = ot.MaximumLikelihoodFactory(gpd_first)
# we fix the threshold to :math:`x_0`.
mlFact.setKnownParameter([x0], [2])
gpd_estimated = mlFact.build(sample_X_upper)
print("estimated gpd = ", gpd_estimated)
g_gpd = gpd_estimated.drawPDF()
g_gpd.setTitle(r"Distribution of $X|X \geq $" + "%.3g" % (x0))
g_gpd.setXTitle("x")
g_gpd.setYTitle("pdf")
g_gpd.setLegends(["GPd"])
view = otv.View(g_gpd)
estimated gpd = GeneralizedPareto(sigma = 0.440432, xi=-0.0537874, u=1.65325)
Then we can create the mixture using the truncated Histogram distribution below and the GPD over weighted by and .
mixt_dist = ot.Mixture([hist_trunc, gpd_estimated], [w, 1 - w])
g_hist.add(mixt_dist.drawPDF())
ord_Max = max(hist_dist.getImplementation().getHeight())
line = ot.Curve([x0, x0], [0.0, ord_Max])
line.setColor("red")
line.setLineStyle("dashed")
g_hist.add(line)
draw_ref = X_dist.drawPDF().getDrawable(0)
draw_ref.setLineStyle("dashed")
g_hist.add(draw_ref)
g_hist.setLegends(["histogram", "mixture", "", "exact dist"])
g_hist.setTitle(r"Distribution of $X$: Mixture")
view = otv.View(g_hist)
# We draw here only the mixture distribution to make the comparison easier.
g_mixt = mixt_dist.drawPDF()
g_mixt.setTitle(r"Mixture distribution of $X$")
g_mixt.setXTitle("x")
g_mixt.setYTitle("pdf")
g_mixt.setLegendPosition("")
view = otv.View(g_mixt)
Case 2: Mixture of discrete distributions¶
In this case, we build the mixture of the following distributions:
The weigths are automatically normalized.
We define the collection of distributions and the associated weights.
distributions = [ot.Poisson(1.2), ot.Geometric(0.7)]
weights = [0.4, 1.0]
We create the mixture.
distribution = ot.Mixture(distributions, weights)
print(distribution)
Mixture((w = 0.285714, d = Poisson(lambda = 1.2)), (w = 0.714286, d = Geometric(p = 0.7)))
We can draw the probability distribution function.
graph = distribution.drawPDF()
graph.setTitle("Mixture of Poisson, Geometric")
graph.setXTitle("x")
graph.setLegendPosition("")
view = otv.View(graph)
We can draw the cumulated distribution function.
graph = distribution.drawCDF()
graph.setTitle("Mixture of Poisson, Geometric")
graph.setXTitle("x")
graph.setLegendPosition("")
view = otv.View(graph)
Case Case 3: Mixture of discrete and continuous distributions¶
In this case, we build the mixture of the following distributions:
The resulting distribution is not continuous nor discrete. It not possible to draw the pdf…
We define the collection of distributions and the associated weights.
distributions = [ot.Normal(), ot.Poisson(0.7)]
weights = [0.4, 1.0]
We create the mixture.
distribution = ot.Mixture(distributions, weights)
print(distribution)
Mixture((w = 0.285714, d = Normal(mu = 0, sigma = 1)), (w = 0.714286, d = Poisson(lambda = 0.7)))
We cannot draw the probability distribution function as it is not defined. But, we can draw the cumulated distribution function.
graph = distribution.drawCDF()
graph.setTitle("Mixture of Normal, Poisson")
graph.setXTitle("x")
graph.setLegendPosition("")
view = otv.View(graph)
Reset ResourceMap
ot.ResourceMap.Reload()
Show all the graphs.
view.ShowAll()