Conditional distributions¶

The library offers some modelisation capacities on conditional distributions:

Case 1: Create a joint distribution using conditioning,
Case 2: Condition a joint distribution by some values of its marginals,
Case 3: Create a distribution whose parameters are random,
Case 4: Create a Bayesian posterior distribution.

Case 1: Create a joint distribution using conditioning¶

The objective is to create the joint distribution of the random vector $(\vect{Y},\inputRV)$ where $\vect{Y}$ follows the distribution $\mathcal{L}_{\vect{Y}}$ and $\inputRV|\vect{\Theta}$ follows the distribution $\mathcal{L}_{\inputRV|\vect{\Theta}}$ where $\vect{\Theta}=g(\vect{Y})$ with $g$ a link function of input dimension the dimension of $\mathcal{L}_{\vect{Y}}$ and output dimension the dimension of $\vect{\Theta}$ .

This distribution is limited to the continuous case, ie when both the conditioning and the conditioned distributions are continuous. Its probability density function is defined as:

$f_{(\vect{Y},\inputRV)}(\vect{y}, \vect{x}) = f_{\inputRV|\vect{\theta}=g(\vect{y})}(\vect{x}|g(\vect{y})) f_{\vect{Y}}( \vect{y})$

with $f_{\inputRV|\vect{\theta} = g(\vect{y})}$ the PDF of the distribution of $\inputRV|\vect{\Theta}$ where $\vect{\Theta}$ has been replaced by $g(\vect{y})$ , $f_{\vect{Y}}$ the PDF of $\vect{Y}$ .

See the class JointByConditioningDistribution.

Case 2: Condition a joint distribution to some values of its marginals¶

Let $\inputRV$ be a random vector of dimension $\inputDim$ . Let $\cI \subset \{1, \dots, \inputDim \}$ be a set of indices of components of $\inputRV$ , $\overline{\cI}$ its complementary in $\{1, \dots, \inputDim \}$ and $\vect{x}_\cI$ a real vector of dimension equal to the cardinal of $\cI$ . The objective is to create the distribution of:

$\inputRV_{\overline{\cI}}|\inputRV_\cI = \vect{x}_\cI$

See the class PointConditionalDistribution.

This class requires the following features:

each component $X_i$ is continuous or discrete: e.g., it can not be a Mixture of discrete and continuous distributions,
the copula of $\inputRV$ is continuous: e.g., it can not be the MinCopula,
the random vector $\inputRV_{\overline{\cI}}$ is continuous or discrete: all its components must be discrete or all its components must be continuous,
the random vector $\inputRV_{\cI}$ may have some discrete components and some continuous components.

Then, the pdf (probability density function if $\inputRV_{\overline{\cI}}$ is continuous or probability distribution function if $\inputRV_{\overline{\cI}}$ is discrete) of $\inputRV_{\overline{\cI}}|\inputRV_\cI = \vect{x}_\cI$ is defined by (in the following expression, we assumed a particular order of the conditioned components among the whole set of components for easy reading):

(1)¶ $p_{\inputRV_{\overline{\cI}}|\inputRV_\cI = \vect{x}_\cI}(x_{\overline{\cI}}) = \dfrac{p_{\inputRV}(\vect{x }_{\overline{\cI}}, \vect{x}_{\cI})}{p_{\inputRV_{\cI}}(\vect{x}_{\cI})}$

where:

$p_{\inputRV}(\vect{x}) = \left( \prod_{i=1}^\inputDim p_i(x_i)\right) c(F_1(x_1), \dots, F_\inputDim(x_\inputDim))$

with:

$c$ is the probability density copula of $\inputRV$ ,
if $X_i$ is a continuous component, $p_i$ is its probability density function,
if $X_i$ is a discrete component, $p_i = \sum_{x^i_k \in \cS^i} \Prob{X_i = x^i_k} \delta_{x^i_k}$ where $\cS^i = \{ x^i_k \}$ is its support and $\delta_{x^i_k}$ the Dirac distribution centered on $x^i_k$ .

Then, if $\inputRV_{\overline{\cI}}$ is continuous, we have:

$p_{\inputRV_{\cI}}(\vect{x}_{\cI}) = \int p_{\inputRV}(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI}) \, \di{\vect{x}_{\overline{\cI}}}$

and if $\inputRV_{\overline{\cI}}$ is discrete with its support denoted by $\cS(\vect{X}_{\overline{\cI}}) = \prod_{i \in \overline{\cI}} S^i$ , we have:

$p_{\inputRV_{\cI}}(\vect{x}_{\cI}) & = \sum_{\vect{x}_{\overline{\cI}} \in \cS(\inputRV_{\overline{\cI}})} p_{\inputRV}(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI})$

Simplification mechanisms to compute (1) are implemented for some distributions. We detail some cases where a simplification has been implemented.

Elliptical distributions: This is the case for normal and Student distributions. If $\inputRV$ follows a normal or a Student distribution, then $\inputRV_{\overline{\cI}}$ respectively follows a normal or a Student distribution with modified parameters. See Conditional Normal and Conditional Student for the formulas of the conditional distributions.

Mixture distributions Let $\inputRV$ be a random vector of dimension $\inputDim$ which distribution is defined by a Mixture of $N$ discrete or continuous atoms. Let denote by $(p_1, \dots, p_N)$ the PDF (Probability Density Function for continuous atoms and Probability Distribution Function for discrete one) of each atom, with respective weights $(w_1, \dots, w_N)$ . Then we get:

$p_\inputRV(\vect{x}) = \sum_{k=1}^N w_k p_k(\vect{x})$

We denote by $p_{k,\cI}$ the PDF of the $k$ -th atom conditioned by $\vect{x}_{\cI}$ . Then, if $p_{\inputRV_\cI}(\vect{x}_{\cI}) \neq 0$ , we get:

$p_{\inputRV|\vect{X}_\cI = \vect{x}_\cI}(\vect{x}_{\overline{\cI}}) & = \dfrac{p_{\vect{X}}(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI})} {p_{\vect{X}_{\cI}}(\vect{x}_{\cI})} \\ & = \sum_{k=1}^N \dfrac{w_k p_{k,\cI}(\vect{x}_\cI)}{p_{\vect{X}_\cI(\vect{x}_\cI)}} \dfrac{ p_k(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI})}{p_{k,\cI}(\vect{x}_\cI)}$

which finally leads to:

(2)¶ $p_{\inputRV|\vect{X}_\cI = \vect{x}_\cI}(\vect{x}_{\overline{\cI}}) = \sum_{k=1}^N \alpha_k \dfrac{ p_k(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI})}{p_{k,\cI}(\vect{x}_\cI)}$

where $\alpha_k = w_k p_{k,\cI}(\vect{x}_\cI) / c$ with $c = p_{\vect{X}_\cI(\vect{x}_\cI)} = \sum_{k=1}^N w_k p_{k,\cI}(\vect{x}_\cI)$ . The constant $c$ normalizes the weights so that $\sum_k \alpha_k = 1$ .

Noting that $\dfrac{ p_k(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI})}{p_{k,\cI}(\vect{x}_\cI)}$ is the PDF of the $k$ -th atom conditioned by $\vect{x}_{\cI}$ , we show that the random vector $\inputRV|\inputRV_\cI = \vect{x}_{\cI}$ is the Mixture built from the $\vect{x}_\cI$ -conditioned atoms with weights $\alpha_k$ .

Conclusion: The conditional distribution of a Mixture is a Mixture of conditional distributions.

Kernel Mixture distributions: The Kernel Mixture distribution is a particular Mixture: all the weights are identical and all the kernels of the combination are of the same discrete or continuous family. The kernels are centered on the sample points. The multivariate kernel is a tensorized product of the same univariate kernel.

Let $\inputRV$ be a random vector of dimension $\inputDim$ defined by a Kernel Mixture distribution based on the sample $(\vect{s}_1, \dots, \vect{s}_\sampleSize)$ and the kernel $K$ . In the continuous case, $k$ is the kernel PDF and we have:

$p_{\inputRV}(\vect{x}) = \sum_{q=1}^\sampleSize \dfrac{1}{\sampleSize} p_q(\vect{x})$

where $p_q$ is the kernel normalized by the bandwidth $h$ :

$p_q(\vect{x}) = \prod_{j=1}^\inputDim \dfrac{1}{h^j}k\left( \dfrac{x^j- s_q^j}{h^j} \right)$

Following the Mixture case, we still have the relation (2). As the multivariate kernel is the tensorized product of the univariate kernel, we get:

$\dfrac{p_q(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI})}{p_{q,\cI}(\vect{x}_\cI)} = \prod_{j \in \overline{\cI}} \dfrac{1}{h^j}k\left( \dfrac{x^j- s_q^j}{h^j} \right)$

Conclusion: The conditional distribution of a Kernel Mixture is a Mixture which atoms are the tensorized product of the kernel on the remaining components $\vect{x}_\cI$ and which weights $\alpha_q$ are proportional to:

$\alpha_q \propto p_{q,\cI}(\vect{x}_\cI) = \prod_{j \in\cI} \dfrac{1}{h^j}k\left( \dfrac{x^j- s_q^j}{h^j} \right)$

as we have $w_k = 1/N$ in (2).

Truncated distributions: Let $\inputRV$ be a random vector of dimension $\inputDim$ which PDF is $p_\inputRV$ . Let $\cD$ be a domain of $\Rset^\inputDim$ and let $\inputRV_T = \inputRV|\inputRV\in \cD$ be the random vector $\inputRV$ truncated to the domain $\cD$ . It has the following PDF:

$p_{\inputRV_T}(\vect{x}) = \dfrac{1}{\alpha} p_{\inputRV}(\vect{x}) 1_{\cD}(\vect{x})$

where $\alpha = \Prob{\inputRV\in \cD}$ . Let $\vect{x}_\cI$ be in the support of the margin $\cI$ of $\inputRV_T$ , denoted by $\inputRV_{T, \cI}$ . We denote by $\vect{Z}$ the conditional random vector:

$\vect{Z} = \inputRV_{T,\overline{\cI}} | \inputRV_{T, \cI} = \vect{x}_\cI$

The random vector $\vect{Z}$ is defined on the domain:

$\cD_{\overline{\cI}} = \{ \vect{x}_{\overline{\cI}} \, |\, (\vect{x}_{\overline{\cI}}, \vect{x}_{\cI}) \in \cD \}$

The domain $\cD_{\overline{\cI}} \neq \emptyset$ as $\vect{x}_\cI \in \supp{\inputRV_{\cI}}$ . Then, for all $\vect{x}_{\overline{\cI}} \in \cD_{\overline{\cI}}$ , we have:

$p_{\vect{Z}}( \vect{x}_{\overline{\cI}}) & = \dfrac{p_{\inputRV_T}(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI})}{p_{\inputRV_{T,\cI}}(\vect{x}_{\cI})} 1_{\cD_{\overline{\cI}}}(\vect{x}_{\overline{\cI}}) \\ & \dfrac{1}{\alpha\, p_{\inputRV_{T,\cI}}(\vect{x}_{\cI})} p_{\inputRV}(\vect{x} _{\overline{\cI}}, \vect{x}_{\cI})) 1_{\cD}(\vect{x}_{\overline{\cI}}, \vect{x} _{\cI}) 1_{\cD_{\overline{\cI}}}(\vect{x}_{\overline{\cI}})\\ & \dfrac{1}{\alpha\, p_{\inputRV_{T,\cI}}(\vect{x}_{\cI})} p_{\inputRV}(\vect{x} _{\overline{\cI}}, \vect{x}_{\cI}) 1_{\cD}(\vect{x}_{\overline{\cI}}, \vect{x} _{\cI})$

which is:

(3)¶ $p_{\vect{Z}}( \vect{x}_{\overline{\cI}}) \propto p_{\inputRV}(\vect{x} _{\overline{\cI}}, \vect{x}_{\cI}) 1_{\cD}(\vect{x}_{\overline{\cI}}, \vect{x} _{\cI})$

Now, we denote by $\vect{Y}$ the conditional random vector:

$\vect{Y} = \inputRV_{\overline{\cI}} | \inputRV_{\cI} = \vect{x}_\cI$

Then, we have:

$p_{\vect{Y}}(\vect{x}_{\overline{\cI}}) = \dfrac{p_{\inputRV}(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI})}{p_{\inputRV_\cI}(\vect{x}_{\cI})}$

Let $\vect{T}$ the truncated random vector defined by:

$\vect{T} = \vect{Y} | \vect{Y} \in \cD_{\overline{\cI}}$

Then, we have:

$p_{\vect{T}}(\vect{x}_{\overline{\cI}}) = \dfrac{1}{\beta} p_{\vect{Y}}(\vect{x}_{\overline{\cI}})1_{\cD_{\overline{\cI}}}(\vect{x}_{\overline{\cI}})$

where $\beta = \Prob{\vect{T} \in \cD_{\overline{\cI}}}$ . Noting that:

$p_{\vect{Y}}(\vect{x}_{\overline{\cI}})1_{\cD_{\overline{\cI}}}(\vect{x}_{\overline{\cI}}) = \dfrac{p_{\inputRV}(\vect{x}_{\overline{\cI}}, \vect{x} _{\cI})}{p_{\inputRV_\cI}(\vect{x}_{\cI})}1_{\cD}(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI}),$

we get:

$p_{\vect{T}}(\vect{x}_{\overline{\cI}}) = \dfrac{1}{\beta p_{\inputRV_\cI}(\vect{x}_{\cI})} p_{\inputRV}(\vect{x}_{\overline{\cI}}, \vect{x} _{\cI})1_{\cD}(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI})$

which is:

(4)¶ $p_{\vect{T}}(\vect{x}_{\overline{\cI}}) \propto p_{\inputRV}(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI})1_{\cD}(\vect{x}_{\overline{\cI}}, \vect{x} _{\cI})$

The equivalence of the relations (3) and (4) proves the conclusion.

Conclusion: The conditional distribution of a truncated distribution is the truncated distribution of the conditional distribution. Care: the truncation domains are not exactly the same.

The following figure illustrates the case where $(X_0, X_1) \sim \cN \left(\vect{0}, \vect{1}, \mat{R} \right)$ with $R(0,1) = 0.8$ . We plot:

the PDF of $\inputRV|\inputRV\in [-0.5, 1.0]$ conditioned by $X_0 = 0.5$ (Cond dist of truncated),
the PDF of the truncation to $[-0.5, 1.0]$ of $\inputRV|X_0 = 0.5$ : (Truncation of cond dist).

../../_images/illustration_conditional_truncated.png

Note that the numerical range of the conditional distribution might be different from the range of the numerical range of the non conditioned distribution. For example, consider a bivariate distribution $(X_0, X_1)$ following a normal distribution with zero mean, unit variance and a correlation $R(0,1) = 0.4$ . Then consider $X_1|X_0 = 10.0$ . The numerical range of $X_1|X_0 = 10$ is $[-3.01, 11.0]$ where as the numerical range of $X_1$ is $[-7.65, 7.65]$ . See Create a Point Conditional Distribution to get some more examples.

The computation of the numerical range is important to make possible the integration of the PDF on some domains. The library implements 3 strategies to compute it. We detail these strategies.

Strategy None: The numerical range of $\inputRV_{\overline{\cI}}|\inputRV_\cI = \vect{x}_\cI$ is the same as the numerical range of $\inputRV_{\overline{\cI}}$ . This range is exact for all distributions with bounded support. For distributions with unbounded support, it is potentially false when the conditional values are very close to the bounds of the initial numerical support.

Strategy Normal: Let $\vect{Y}$ be the Gaussian vector of dimension $\inputDim$ , which mean vector is defined by $\vect{\mu} = \Expect{\inputRV}$ and covariance matrix is defined by $\mat{C} = \Cov{\inputRV}$ . Then, we build the conditioned Gaussian vector:

$\vect{Y}_{\overline{\cI}}|\vect{Y}_\cI = \vect{x}_\cI$

The numerical range $\cD \left(\vect{Y}_{\overline{\cI}}|\vect{Y}_\cI = \vect{x}_\cI \right)$ of $\vect{Y}_{\overline{\cI}}|\vect{Y}_\cI = \vect{x}_\cI$ is known exactly thanks to the simplification mechanism implemented for Gaussian vectors. We assign to $\inputRV_{\overline{\cI}}|\inputRV_\cI = \vect{x}_\cI$ the range $\cD \left(\vect{Y}_{\overline{\cI}}|\vect{Y}_\cI = \vect{x}_\cI \right)$ :

$\cD\left(\inputRV_{\overline{\cI}}|\inputRV_\cI = \vect{x}_\cI \right) = \cD \left(\vect{Y}_{\overline{\cI}}|\vect{Y}_\cI = \vect{x}_\cI \right)$

Strategy NormalCopula: Let $\vect{Y}$ be the Gaussian vector of dimension $\inputDim$ , with zero mean, unit variance and which correlation matrix $\mat{R}$ is defined from the Spearman correlation matrix of $\inputRV$ : $\left( \rho_S(X_i, X_j) \right)_{1 \leq i, j \leq \inputDim}$ . Thus, $\vect{Y}$ is the standard representant of the normal copula having the same correlation as $\inputRV$ .

For each conditioning value $x_i$ , we define the quantile $q_i$ of the normal distribution with zero mean and unit variance associated to the same order as $x_i$ , for $i \in \cI$ :

$q_i & = \Phi^{-1} \circ F_i \left (x_i \right)$

where $\Phi$ is the CDF of the normal distribution with zero mean and unit variance. Then, we build the conditioned Gaussian vector:

$\vect{Y}_{\overline{\cI}}|\vect{Y}_\cI = \vect{q}_\cI$

which numerical range $\cD\left(\vect{Y}_{\overline{\cI}}|\vect{Y}_\cI = \vect{q}_\cI \right)$ can be exactly computed. Let it be:

$\cD\left(\vect{Y}_{\overline{\cI}}|\vect{Y}_\cI = \vect{q}_\cI \right) = \prod_{i\in \overline{\cI}} \left[ y_i^{min}, y_i^{max}\right]$

Then, inversely, we compute the quantiles of each $F_i$ for $i \in \cI$ which have the same order as the bounds $y_i^{min}$ and $y_i^{max}$ with respect $\Phi$ :

$x_i^{min} & = F_i^{-1}\circ \Phi \left (y_i^{min} \right) \\ x_i^{max} & = F_i^{-1}\circ \Phi \left (y_i^{max} \right)$

We assign to $\inputRV_{\overline{\cI}}|\inputRV_\cI = \vect{x}_\cI$ the numerical range defined by:

$\cD \left(\vect{X}_{\overline{\cI}}|\vect{X}_\cI = \vect{x}_\cI \right) = \prod_{i\in \overline{\cI}} \left[ x_i^{min}, x_i^{max}\right]$

Case 3: Create a distribution whose parameters are random¶

The objective is to create the marginal distribution of $\inputRV$ in Case 1.

See the class DeconditionedDistribution.

This class requires the following features:

the $\inputRV$ may be continuous, discrete or neither: e.g., it can be a Mixture of discrete and continuous distributions. In that case, its parameters set is the union of the parameters set of each of its atoms (the weights of the mixture are not considered as parameters).
each component $Y_i$ is continuous or discrete: e.g., it can not be a Mixture of discrete and continuous distributions, (so that the random vector $\vect{Y}$ may have some discrete components and some continuous components),
the copula of $\vect{Y}$ is continuous: e.g., it can not be the MinCopula,
if $\vect{Y}$ has both discrete components and continuous components, its copula must be the independent copula. The general case has not been implemented yet.

We define:

$p_{\vect{Y}}(\vect{y}) = \left( \prod_{i=1}^\inputDim p_i(y_i) \right) c(F_1(x_1), \dots, F_d(x_d))$

where:

$c$ is the probability density copula of $\vect{Y}$ ,
if $Y_i$ is a continuous component, $p_i$ is its probability density function,
if $Y_i$ is a discrete component, $p_i = \sum_{y^i_k \in \cS^i} \Prob{Y_i = y^i_k} \delta_{y^i_k}$ where $\cS^i = \{ y^i_k \}$ is its support and $\delta_{y^i_k}$ the Dirac distribution centered on $y^i_k$ .

Then, the PDF of $\inputRV$ is defined by:

$p_{\vect{X}}(\vect{x}) = \int p_{\vect{X}|\vect{\Theta}=g(\vect{y})}(\vect{x}|g(\vect{y})) p_{\vect{Y}}(\vect{y})\di{\vect{y}}$

with the same convention as for $\vect{Y}$ .

Note that this is always possible to create the random vector $\inputRV$ whatever the distribution of $\vect{\Theta}$ : see the class DeconditionedRandomVector. But remember that a DeconditionedRandomVector (and more generally a RandomVector) can only be sampled.

Case 4: Create a Bayesian posterior distribution¶

Consider the random vector $\vect{X}$ where $\vect{X}|\vect{\Theta}$ follows the distribution $\mathcal{L}_{\vect{X}|\vect{\Theta}}$ , with $\vect{\Theta} = g(\vect{Y})$ and $\vect{Y}$ following the prior distribution $\mathcal{L}_{\vect{Y}}$ . The function $g$ is a link function which input dimension is the dimension of $\mathcal{L}_{\vect{Y}}$ and which output dimension the dimension of $\vect{\Theta}$ .

The objective is to create the posterior distribution of $\vect{Y}$ given that we have a sample $(\vect{x}_1, \dots, \vect{x}_\sampleSize)$ of $\vect{X}$ .

See the class PosteriorDistribution.

This class requires the following features:

the $\inputRV$ may be continuous, discrete or neither: e.g., it can be a Mixture of discrete and continuous distributions. In that case, its parameters set is the union of the parameters set of each of its atoms (the weights of the mixture are not considered as parameters).
each component $Y_i$ is continuous or discrete: e.g., it can not be a Mixture of discrete and continuous distributions, (the random vector $\vect{Y}$ may have some discrete components and some continuous components),
the copula of $\vect{Y}$ is continuous: e.g., it can not be the MinCopula.

If $\vect{Y}$ and $\vect{X}$ are continuous random vector, then the posterior PDF of $\vect{Y}$ is defined by:

(5)¶ $f_{\vect{Y}|\inputRV_1 = \vect{x}_1, \dots, \inputRV_\sampleSize = \vect{x}_\sampleSize}(\vect{y}) = \frac{f_{\vect{Y}}(\vect{y}) \prod_{i=1}^\sampleSize f_{\inputRV|\vect{\theta} = g(\vect{y})}(\vect{x}_i)}{\int f_{\vect{Y}}(\vect{y})\prod_{i=1}^\sampleSize f_{\inputRV|\vect{\theta} = g(\vect{y})}(\vect{x}_i) d \vect{y}}$

with $f_{\inputRV|\vect{\theta} = g(\vect{y})}$ the PDF of the distribution of $\inputRV|\vect{\Theta}$ where $\vect{\Theta}$ has been replaced by $g(\vect{y})$ and $f_{\vect{Y}}$ the PDF of the prior distribution of $\vect{Y}$ .

Note that the denominator of (5) is the PDF of the deconditioned distribution of $\inputRV|\vect{\Theta}=g(\vect{Y})$ with respect to the prior distribution of $\vect{Y}$ .

In the other cases, the PDF is the probability distribution function for the discrete components and the $\int$ are replaced by some $\sum$ .

OpenTURNS

An Open source initiative for the Treatment of Uncertainties, Risks'N Statistics

Table of Contents

Previous topic

Next topic

This Page

Conditional distributions¶

Case 1: Create a joint distribution using conditioning¶

Case 2: Condition a joint distribution to some values of its marginals¶

Case 3: Create a distribution whose parameters are random¶

Case 4: Create a Bayesian posterior distribution¶