Create univariate and multivariate distributions: a quick start guide to distributions

Abstract

In this example, we present classes for univariate and multivariate distributions. We demonstrate the probabilistic programming capabilities of the library. For univariate distributions, we show how to compute the probability density, the cumulated probability density and the quantiles. We also show how to create graphics. The ComposedDistribution class, which creates a distribution based on its marginals and its copula, is presented. We show how to truncate any distribution with the TruncatedDistribution class.

Univariate distribution

The library is a probabilistic programming library: it is possible to create a random variable and perform operations on this variable without generating a sample.

In the OpenTURNS platform, several univariate distributions are implemented. The most commonly used are:

  • Uniform,

  • Normal,

  • Beta,

  • LogNormal,

  • Exponential,

  • Weibull.

[1]:
import openturns as ot

The uniform distribution

Let us create a uniform random variable \mathcal{U}(2,5).

[2]:
uniform = ot.Uniform(2,5)

The drawPDF method plots the probability density function.

[3]:
uniform.drawPDF()
[3]:
../../_images/examples_probabilistic_modeling_quick_start_guide_distributions_7_0.png

The computePDF method computes the probability distribution at a specific point.

[4]:
uniform.computePDF(3.5)
[4]:
0.3333333333333333

The drawCDF method plots the cumulated distribution function.

[5]:
uniform.drawCDF()
[5]:
../../_images/examples_probabilistic_modeling_quick_start_guide_distributions_11_0.png

The computeCDF method computes the value of the cumulated distribution function a given point.

[6]:
uniform.computeCDF(3.5)
[6]:
0.5

The getSample method generates a sample.

[7]:
sample = uniform.getSample(10)
sample
[7]:
X0
03.8896296698233312
14.648415671157781
22.4058290524565233
32.097508253613154
43.0411711236405656
54.908269063404967
64.762038780071152
73.509120454348147
82.1896182295391515
92.878270681243195

The most common way to “see” a sample is to plot the empirical histogram.

[8]:
sample = uniform.getSample(1000)
ot.HistogramFactory().build(sample).drawPDF()
[8]:
../../_images/examples_probabilistic_modeling_quick_start_guide_distributions_17_0.png

Multivariate distributions with or without independent copula

We can create multivariate distributions by two different methods:

  • we can also create a multivariate distribution by combining a list of univariate marginal distribution and a copula,

  • some distributions are defined as multivariate distributions: Normal, Dirichlet, Student.

Since the method based on a marginal and a copula is more flexible, we illustrate below this principle.

In the following script, we define a bivariate distribution made of two univariate distributions (Gaussian and uniform) and an independent copula.

The second input argument of the ComposedDistribution class is optional: if it is not specified, the copula is independent by default.

[9]:
normal = ot.Normal()
uniform = ot.Uniform()
distribution = ot.ComposedDistribution([normal, uniform])
distribution
[9]:

ComposedDistribution(Normal(mu = 0, sigma = 1), Uniform(a = -1, b = 1), IndependentCopula(dimension = 2))

We can also use the IndependentCopula class.

[10]:
normal = ot.Normal()
uniform = ot.Uniform()
copula = ot.IndependentCopula(2)
distribution = ot.ComposedDistribution([normal, uniform], copula)
distribution
[10]:

ComposedDistribution(Normal(mu = 0, sigma = 1), Uniform(a = -1, b = 1), IndependentCopula(dimension = 2))

We see that this produces the same result: in the end of this section, we will change the copula and see what happens.

The getSample method produces a sample from this distribution.

[11]:
distribution.getSample(10)
[11]:
X0X1
01.2598879680449580.08302850200114786
1-2.15135258355867040.21233747708031991
20.6383845131142492-0.31946948335596526
31.1985967013995071-0.3335571137988924
40.5760514867363357-0.4738648412897719
5-2.7210575431765310.6944882272474788
6-0.456634339460408660.7525542385695916
70.8344516247016848-0.19909337979103858
80.18972282501335233-0.49021388839792124
9-1.5598970035006174-0.6660326523710496

In order to visualize a bivariate sample, we can use the Cloud class.

[12]:
sample = distribution.getSample(1000)
showAxes = True
graph = ot.Graph("X0~N, X1~U", "X0", "X1", showAxes)
cloud = ot.Cloud(sample, "blue", "fsquare", "") # Create the cloud
graph.add(cloud) # Then, add it to the graph
graph
[12]:
../../_images/examples_probabilistic_modeling_quick_start_guide_distributions_29_0.png

We see that the marginals are Gaussian and uniform and that the copula is independent.

Define a plot a copula

The NormalCopula class allows to create a Gaussian copula. Such a copula is defined by its correlation matrix.

[13]:
R = ot.CorrelationMatrix(2)
R[0,1] = 0.6
copula = ot.NormalCopula(R)
copula
[13]:

NormalCopula(R = [[ 1 0.6 ]
[ 0.6 1 ]])

We can draw the contours of a copula with the drawPDF method.

[14]:
copula.drawPDF()
[14]:
../../_images/examples_probabilistic_modeling_quick_start_guide_distributions_35_0.png

Multivariate distribution with arbitrary copula

Now that we know that we can define a copula, we create a bivariate distribution with normal and uniform marginals and an arbitrary copula. We select the the Ali-Mikhail-Haq copula as an example of a non trivial dependence.

[15]:
normal = ot.Normal()
uniform = ot.Uniform()
theta = 0.9
copula = ot.AliMikhailHaqCopula(theta)
distribution = ot.ComposedDistribution([normal, uniform], copula)
distribution
[15]:

ComposedDistribution(Normal(mu = 0, sigma = 1), Uniform(a = -1, b = 1), AliMikhailHaqCopula(theta = 0.9))

[16]:
sample = distribution.getSample(1000)
showAxes = True
graph = ot.Graph("X0~N, X1~U, Ali-Mikhail-Haq copula", "X0", "X1", showAxes)
cloud = ot.Cloud(sample, "blue", "fsquare", "") # Create the cloud
graph.add(cloud) # Then, add it to the graph
graph
[16]:
../../_images/examples_probabilistic_modeling_quick_start_guide_distributions_39_0.png

We see that the sample is quite different from the previous sample with independent copula.

Draw several distributions in the same plot

It is sometimes convenient to create a plot presenting the PDF and CDF on the same graphics. This is possible thanks to Matplotlib.

[17]:
beta = ot.Beta(5, 7, 9, 10)
pdfbeta = beta.drawPDF()
cdfbeta = beta.drawCDF()
exponential = ot.Exponential(3)
pdfexp = exponential.drawPDF()
cdfexp = exponential.drawCDF()
[18]:
import openturns.viewer as otv
[19]:
import pylab as plt
fig = plt.figure(figsize=(12, 4))
ax = fig.add_subplot(2, 2, 1)
_ = otv.View(pdfbeta, figure=fig, axes=[ax])
ax = fig.add_subplot(2, 2, 2)
_ = otv.View(cdfbeta, figure=fig, axes=[ax])
ax = fig.add_subplot(2, 2, 3)
_ = otv.View(pdfexp, figure=fig, axes=[ax])
ax = fig.add_subplot(2, 2, 4)
_ = otv.View(cdfexp, figure=fig, axes=[ax])
../../_images/examples_probabilistic_modeling_quick_start_guide_distributions_45_0.png

Truncate a distribution

Any distribution can be truncated with the TruncatedDistribution class.

Let f_X (resp. F_X) the PDF (resp. the CDF) of the real random variable X. Let a and b two reals with a<b. Let Y be the random variable max(a, min(b, X)). Its distribution is the distribution of X truncated to the [a,b] interval.

Therefore, the PDF of Y is:

f_Y(y) = \frac{f_X(y)}{F_X(b) - F_X(a)}

if y\in[a,b] and f_Y(y)=0 otherwise.

Consider for example the log-normal variable X with mean \mu=0 and standard deviation \sigma=1.

[20]:
X = ot.LogNormal()
X.drawPDF()
[20]:
../../_images/examples_probabilistic_modeling_quick_start_guide_distributions_48_0.png

We can truncate this distribution to the [1,2] interval. We see that the PDF of the distribution becomes discontinuous at the truncation points 1 and 2.

[21]:
Y = ot.TruncatedDistribution(X,1.,2.)
Y.drawPDF()
[21]:
../../_images/examples_probabilistic_modeling_quick_start_guide_distributions_50_0.png

We can also also truncate it with only a lower bound.

[22]:
Y = ot.TruncatedDistribution(X,1.,ot.TruncatedDistribution.LOWER)
Y.drawPDF()
[22]:
../../_images/examples_probabilistic_modeling_quick_start_guide_distributions_52_0.png

We can finally truncate a distribution with an upper bound.

[23]:
Y = ot.TruncatedDistribution(X,2.,ot.TruncatedDistribution.UPPER)
Y.drawPDF()
[23]:
../../_images/examples_probabilistic_modeling_quick_start_guide_distributions_54_0.png

In the specific case of the Gaussian distribution, the specialized TruncatedNormal distribution can be used instead of the generic TruncatedDistribution class.