Note

Go to the end to download the full example code.

Create a mixture of distributions¶

Introduction¶

In this example, we show how to build a distribution which is a Mixture from a collection of distributions of the same dimension $d \geq 1$ . We denote by $\inputRV$ the random vector with such a distribution.

Let $(\cL_1, \dots, \cL_N)$ a collection of distributions and $(\alpha_1, \dots, \alpha_N)$ a collection of positive weights such that $\sum_{i=1}^N \alpha_i = 1$ . Let $\inputRV_i$ be a random vector whose distribution is $\cL_i$ .

In the case where all the $\cL_i$ have a probability density function $\mu_i$ , then the mixture has a probability density function $\inputMeasure$ defined by:

$\inputMeasure(\vect{x}) = \sum_{i=1}^N \alpha_i \mu_i(\vect{x})$

In the case where all the $\cL_i$ are discrete, then the mixture is discrete and its probability distribution function is defined by:

$\Prob{\inputRV = \vect{x}} = \sum_{i=1}^N \alpha_i \Prob{\vect{X}_i = \vect{x}}$

In the case where some of the $\cL_i$ have a probability density function and the other ones are discrete, then the mixture is not discrete and does not have a probability density function. Its cumulated distribution function is defined by:

$F_\inputRV(\vect{x}) = \sum_{i=1}^N \alpha_i F_{\vect{X}_i}(\vect{x})$

We illustrate the following particular cases:

Case 1a: Mixture of continuous distributions,
Case 1b: Mixture of copulas,
Case 1c: Mixture of a Histogram and a Generalized Pareto Distribution,
Case 2: Mixture of discrete distributions,
Case 3: Mixture of discrete and continuous distributions.

import openturns as ot
import openturns.viewer as otv

Case 1a: Mixture of continuous distributions¶

In this case, we build the mixture of the following continuous distributions:

The weigths are automatically normalized.

We define the collection of distributions and the associated weights.

distributions = [
    ot.Triangular(1.0, 2.0, 4.0),
    ot.Normal(-1.0, 1.0),
    ot.Uniform(5.0, 6.0),
]
weights = [0.4, 1.0, 0.2]

We create the mixture.

distribution = ot.Mixture(distributions, weights)
print(distribution)

Mixture((w = 0.25, d = Triangular(a = 1, m = 2, b = 4)), (w = 0.625, d = Normal(mu = -1, sigma = 1)), (w = 0.125, d = Uniform(a = 5, b = 6)))

We can draw the probability density function.

graph = distribution.drawPDF()
graph.setTitle("Mixture of Triangular, Normal, Uniform")
graph.setXTitle("x")
graph.setLegendPosition("")
view = otv.View(graph)

We can draw the cumulated distribution function.

graph = distribution.drawCDF()
graph.setTitle("Mixture of Triangular, Normal, Uniform")
graph.setXTitle("x")
graph.setLegendPosition("")
view = otv.View(graph)

Case 1b: Mixture of copulas¶

In this case, we build the mixture of the following copulas:

a GumbelCopula,
a ClaytonCopula.

We define the collection of copulas and the associated weights.

copulas = [ot.GumbelCopula(4.5), ot.ClaytonCopula(2.3)]
weights = [0.2, 0.8]

We create a mixture of copulas.

distribution = ot.Mixture(copulas, weights)
print(distribution)

Mixture((w = 0.2, d = GumbelCopula(theta = 4.5)), (w = 0.8, d = ClaytonCopula(theta = 2.3)))

We can draw the probability density function.

graph = distribution.drawPDF()
graph.setTitle("Mixture of Gumbel copula, Clayton copula")
graph.setXTitle(r"$x_0$")
graph.setYTitle(r"$x_1$")
graph.setLegendPosition("")
view = otv.View(graph)

Mixture of Gumbel copula, Clayton copula

We can draw the cumulated distribution function.

graph = distribution.drawCDF()
graph.setTitle("Mixture of Gumbel copula, Clayton copula")
graph.setXTitle(r"$x_0$")
graph.setYTitle(r"$x_1$")
view = otv.View(graph)

Case 1c: Mixture of a Histogram and a Generalized Pareto Distribution¶

We want to create the scalar distribution of $X$ such that:

$X|X \leq x_0 & \sim \mathcal{L}_1 \\ X|X \geq x_0 & \sim \mathcal{L}_2$

where:

$\mathcal{L}_1$ is a Histogram,
$\mathcal{L}_2$ is a Generalized Pareto distribution (GPD),
$x_0$ is a quantile of high level of $X$ .

Let us define:

$w = \Prob{X \leq x_0}$

We assume that we only have a sample from $X$ .

In this example, we consider a Normal distribution with zero mean and unit variance. We generate a sample of size $n$ .

n = 5000
X_dist = ot.Normal()
sample_X = X_dist.getSample(n)

We build the whole histogram from the sample.

hist_dist = ot.HistogramFactory().build(sample_X)
g_hist = hist_dist.drawPDF()
g_hist.setTitle(r"Empirical distribution of $X$")
g_hist.setXTitle("x")
g_hist.setYTitle("pdf")
g_hist.setLegends(["histogram"])
view = otv.View(g_hist)

We estimate the extreme empirical quantile of level 0.95.

w = 0.95
x0 = hist_dist.computeQuantile(w)[0]

We start by truncating the initial histogram on the interval $]-\infty, x_0]$ . We visualize it!

hist_trunc = ot.TruncatedDistribution(hist_dist, x0, ot.TruncatedDistribution.UPPER)
g_hist_trunc = hist_trunc.drawPDF()
g_hist_trunc.setTitle(r"Empirical distribution of $X|X \leq $" + "%.3g" % (x0))
g_hist_trunc.setXTitle("x")
g_hist_trunc.setYTitle("pdf")
g_hist_trunc.setLegends(["truncated histogram"])
view = otv.View(g_hist_trunc)

$Empirical distribution of $X|X \leq $1.64$

Then we model $X|X \geq x_0$ by a Generalized Pareto distribution (GPD). We start by extracting from the sample all the values greater than $x_0$ to build the upper sample. We get about $n(1-w)$ points.

sample_X_upper = ot.Sample(0, 1)
for i in range(len(sample_X)):
    if sample_X[i, 0] > x0:
        sample_X_upper.add(sample_X[i])

print("Excess number = ", sample_X_upper.getSize())
print("n(1-w) = ", int(n * (1 - w)))

Excess number =  248
n(1-w) =  250

Then we fit a GPD parameterized by $(\sigma, \xi, x_0)$ : the threshold is fixed to $x_0$ . We use the estimator that maximizes the likelihood. To solve the optimisation problem faster, we start by estimating the 3 parameters $(\sigma, \xi, u)$ from the upper sample. Then we fix the threshold to $u = x_0$ and we estimate the remaining parameters $(\sigma, \xi)$ using the previous values of $(\sigma, \xi)$ as a starting point to the optimization problem. We visualize the pdf of the GPD.

gpd_first = ot.GeneralizedParetoFactory().build(sample_X_upper)
mlFact = ot.MaximumLikelihoodFactory(gpd_first)

# we fix the threshold to :math:`x_0`.
mlFact.setKnownParameter([2], [x0])
gpd_estimated = mlFact.build(sample_X_upper)
print("estimated gpd = ", gpd_estimated)

g_gpd = gpd_estimated.drawPDF()
g_gpd.setTitle(r"Distribution of $X|X \geq $" + "%.3g" % (x0))
g_gpd.setXTitle("x")
g_gpd.setYTitle("pdf")
g_gpd.setLegends(["GPd"])
view = otv.View(g_gpd)

$Distribution of $X|X \geq $1.64$

estimated gpd =  GeneralizedPareto(sigma = 0.487432, xi=-0.152824, u=1.64151)

Then we can create the mixture using the truncated Histogram distribution below $x_0$ and the GPD over $x_0$ weighted by $w$ and $(1-w)$ .

mixt_dist = ot.Mixture([hist_trunc, gpd_estimated], [w, 1 - w])
g_hist.add(mixt_dist.drawPDF())

ord_Max = max(hist_dist.getImplementation().getHeight())
line = ot.Curve([x0, x0], [0.0, ord_Max])
line.setColor("red")
line.setLineStyle("dashed")
g_hist.add(line)

draw_ref = X_dist.drawPDF().getDrawable(0)
draw_ref.setLineStyle("dashed")
g_hist.add(draw_ref)

g_hist.setLegends(["histogram", "mixture", "", "exact dist"])
g_hist.setTitle(r"Distribution of $X$: Mixture")

view = otv.View(g_hist)

# We draw here only the mixture distribution to make the comparison easier.
g_mixt = mixt_dist.drawPDF()
g_mixt.setTitle(r"Mixture distribution of $X$")
g_mixt.setXTitle("x")
g_mixt.setYTitle("pdf")
g_mixt.setLegendPosition("")
view = otv.View(g_mixt)

Case 2: Mixture of discrete distributions¶

In this case, we build the mixture of the following distributions:

a Poisson,
a Geometric.

The weigths are automatically normalized.

We define the collection of distributions and the associated weights.

distributions = [ot.Poisson(1.2), ot.Geometric(0.7)]
weights = [0.4, 1.0]

We create the mixture.

distribution = ot.Mixture(distributions, weights)
print(distribution)

Mixture((w = 0.285714, d = Poisson(lambda = 1.2)), (w = 0.714286, d = Geometric(p = 0.7)))

We can draw the probability distribution function.

graph = distribution.drawPDF()
graph.setTitle("Mixture of Poisson, Geometric")
graph.setXTitle("x")
graph.setLegendPosition("")
view = otv.View(graph)

We can draw the cumulated distribution function.

graph = distribution.drawCDF()
graph.setTitle("Mixture of Poisson, Geometric")
graph.setXTitle("x")
graph.setLegendPosition("")
view = otv.View(graph)

Case Case 3: Mixture of discrete and continuous distributions¶

In this case, we build the mixture of the following distributions:

a Normal,
a Poisson.

The resulting distribution is not continuous nor discrete. It not possible to draw the pdf…

We define the collection of distributions and the associated weights.

distributions = [ot.Normal(), ot.Poisson(0.7)]
weights = [0.4, 1.0]

We create the mixture.

distribution = ot.Mixture(distributions, weights)
print(distribution)

Mixture((w = 0.285714, d = Normal(mu = 0, sigma = 1)), (w = 0.714286, d = Poisson(lambda = 0.7)))

We cannot draw the probability distribution function as it is not defined. But, we can draw the cumulated distribution function.

graph = distribution.drawCDF()
graph.setTitle("Mixture of Normal, Poisson")
graph.setXTitle("x")
graph.setLegendPosition("")
view = otv.View(graph)

Show all the graphs.

otv.View.ShowAll()

OpenTURNS

An Open source initiative for the Treatment of Uncertainties, Risks'N Statistics

Table of Contents

Previous topic

Next topic

This Page

Create a mixture of distributions¶

Introduction¶

Case 1a: Mixture of continuous distributions¶

Case 1b: Mixture of copulas¶

Case 1c: Mixture of a Histogram and a Generalized Pareto Distribution¶

Case 2: Mixture of discrete distributions¶

Case Case 3: Mixture of discrete and continuous distributions¶