Conditional distributions

The library offers some modelisation capacities on conditional distributions:

  • Case 1: Create a joint distribution using conditioning,

  • Case 2: Condition a joint distribution by some values of its marginals,

  • Case 3: Create a distribution whose parameters are random,

  • Case 4: Create a Bayesian posterior distribution.

Case 1: Create a joint distribution using conditioning

The objective is to create the joint distribution of the random vector (\vect{Y},\inputRV) where \vect{Y} follows the distribution \mathcal{L}_{\vect{Y}} and \inputRV|\vect{\Theta} follows the distribution \mathcal{L}_{\inputRV|\vect{\Theta}} where \vect{\Theta}=g(\vect{Y}) with g a link function of input dimension the dimension of \mathcal{L}_{\vect{Y}} and output dimension the dimension of \vect{\Theta}.

This distribution is limited to the continuous case, ie when both the conditioning and the conditioned distributions are continuous. Its probability density function is defined as:

f_{(\vect{Y},\inputRV)}(\vect{y}, \vect{x}) = f_{\inputRV|\vect{\theta}=g(\vect{y})}(\vect{x}|g(\vect{y})) f_{\vect{Y}}( \vect{y})

with f_{\inputRV|\vect{\theta} = g(\vect{y})} the PDF of the distribution of \inputRV|\vect{\Theta} where \vect{\Theta} has been replaced by g(\vect{y}), f_{\vect{Y}} the PDF of \vect{Y}.

See the class JointByConditioningDistribution.

Case 2: Condition a joint distribution to some values of its marginals

Let \inputRV be a random vector of dimension \inputDim. Let \cI \subset \{1, \dots, \inputDim \} be a set of indices of components of \inputRV, \overline{\cI} its complementary in \{1, \dots, \inputDim \} and \vect{x}_\cI a real vector of dimension equal to the cardinal of \cI. The objective is to create the distribution of:

\inputRV_{\overline{\cI}}|\inputRV_\cI = \vect{x}_\cI

See the class PointConditionalDistribution.

This class requires the following features:

  • each component X_i is continuous or discrete: e.g., it can not be a Mixture of discrete and continuous distributions,

  • the copula of \inputRV is continuous: e.g., it can not be the MinCopula,

  • the random vector \inputRV_{\overline{\cI}} is continuous or discrete: all its components must be discrete or all its components must be continuous,

  • the random vector \inputRV_{\cI} may have some discrete components and some continuous components.

Then, the pdf (probability density function if \inputRV_{\overline{\cI}} is continuous or probability distribution function if \inputRV_{\overline{\cI}} is discrete) of \inputRV_{\overline{\cI}}|\inputRV_\cI = \vect{x}_\cI is defined by (in the following expression, we assumed a particular order of the conditioned components among the whole set of components for easy reading):

(1)p_{\inputRV_{\overline{\cI}}|\inputRV_\cI = \vect{x}_\cI}(x_{\overline{\cI}})  = \dfrac{p_{\inputRV}(\vect{x
}_{\overline{\cI}}, \vect{x}_{\cI})}{p_{\inputRV_{\cI}}(\vect{x}_{\cI})}

where:

p_{\inputRV}(\vect{x})  = \left( \prod_{i=1}^\inputDim p_i(x_i)\right) c(F_1(x_1), \dots,
F_\inputDim(x_\inputDim))

with:

  • c is the probability density copula of \inputRV,

  • if X_i is a continuous component, p_i is its probability density function,

  • if X_i is a discrete component, p_i = \sum_{x^i_k \in \cS^i} \Prob{X_i = x^i_k} \delta_{x^i_k} where \cS^i = \{ x^i_k \} is its support and \delta_{x^i_k} the Dirac distribution centered on x^i_k.

Then, if \inputRV_{\overline{\cI}} is continuous, we have:

p_{\inputRV_{\cI}}(\vect{x}_{\cI})  = \int p_{\inputRV}(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI}) \,
\di{\vect{x}_{\overline{\cI}}}

and if \inputRV_{\overline{\cI}} is discrete with its support denoted by \cS(\vect{X}_{\overline{\cI}}) = \prod_{i \in \overline{\cI}} S^i, we have:

p_{\inputRV_{\cI}}(\vect{x}_{\cI})  & = \sum_{\vect{x}_{\overline{\cI}} \in \cS(\inputRV_{\overline{\cI}})}
p_{\inputRV}(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI})

Simplification mechanisms to compute (1) are implemented for some distributions. We detail some cases where a simplification has been implemented.

Elliptical distributions: This is the case for normal and Student distributions. If \inputRV follows a normal or a Student distribution, then \inputRV_{\overline{\cI}} respectively follows a normal or a Student distribution with modified parameters. See Conditional Normal and Conditional Student for the formulas of the conditional distributions.

Mixture distributions Let \inputRV be a random vector of dimension \inputDim which distribution is defined by a Mixture of N discrete or continuous atoms. Let denote by (p_1, \dots, p_N) the PDF (Probability Density Function for continuous atoms and Probability Distribution Function for discrete one) of each atom, with respective weights (w_1, \dots, w_N). Then we get:

p_\inputRV(\vect{x}) = \sum_{k=1}^N w_k p_k(\vect{x})

We denote by p_{k,\cI} the PDF of the k-th atom conditioned by \vect{x}_{\cI}. Then, if p_{\inputRV_\cI}(\vect{x}_{\cI}) \neq 0, we get:

p_{\inputRV|\vect{X}_\cI  = \vect{x}_\cI}(\vect{x}_{\overline{\cI}}) & = \dfrac{p_{\vect{X}}(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI})}
                            {p_{\vect{X}_{\cI}}(\vect{x}_{\cI})} \\
                   & = \sum_{k=1}^N \dfrac{w_k p_{k,\cI}(\vect{x}_\cI)}{p_{\vect{X}_\cI(\vect{x}_\cI)}} \dfrac{ p_k(\vect{x}_{\overline{\cI}},
                   \vect{x}_{\cI})}{p_{k,\cI}(\vect{x}_\cI)}

which finally leads to:

(2)p_{\inputRV|\vect{X}_\cI  = \vect{x}_\cI}(\vect{x}_{\overline{\cI}}) =
    \sum_{k=1}^N \alpha_k \dfrac{ p_k(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI})}{p_{k,\cI}(\vect{x}_\cI)}

where \alpha_k = w_k p_{k,\cI}(\vect{x}_\cI) / c with c = p_{\vect{X}_\cI(\vect{x}_\cI)} = \sum_{k=1}^N w_k p_{k,\cI}(\vect{x}_\cI). The constant c normalizes the weights so that \sum_k \alpha_k = 1.

Noting that \dfrac{ p_k(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI})}{p_{k,\cI}(\vect{x}_\cI)} is the PDF of the k-th atom conditioned by \vect{x}_{\cI}, we show that the random vector \inputRV|\inputRV_\cI = \vect{x}_{\cI} is the Mixture built from the \vect{x}_\cI-conditioned atoms with weights \alpha_k.

Conclusion: The conditional distribution of a Mixture is a Mixture of conditional distributions.

Kernel Mixture distributions: The Kernel Mixture distribution is a particular Mixture: all the weights are identical and all the kernels of the combination are of the same discrete or continuous family. The kernels are centered on the sample points. The multivariate kernel is a tensorized product of the same univariate kernel.

Let \inputRV be a random vector of dimension \inputDim defined by a Kernel Mixture distribution based on the sample (\vect{s}_1, \dots, \vect{s}_\sampleSize) and the kernel K. In the continuous case, k is the kernel PDF and we have:

p_{\inputRV}(\vect{x}) = \sum_{q=1}^\sampleSize \dfrac{1}{\sampleSize} p_q(\vect{x})

where p_q is the kernel normalized by the bandwidth h:

p_q(\vect{x}) = \prod_{j=1}^\inputDim \dfrac{1}{h^j}k\left( \dfrac{x^j- s_q^j}{h^j} \right)

Following the Mixture case, we still have the relation (2). As the multivariate kernel is the tensorized product of the univariate kernel, we get:

\dfrac{p_q(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI})}{p_{q,\cI}(\vect{x}_\cI)} = \prod_{j \in \overline{\cI}}
\dfrac{1}{h^j}k\left( \dfrac{x^j- s_q^j}{h^j} \right)

Conclusion: The conditional distribution of a Kernel Mixture is a Mixture which atoms are the tensorized product of the kernel on the remaining components \vect{x}_\cI and which weights \alpha_q are proportional to:

\alpha_q \propto p_{q,\cI}(\vect{x}_\cI) = \prod_{j \in\cI} \dfrac{1}{h^j}k\left( \dfrac{x^j- s_q^j}{h^j} \right)

as we have w_k = 1/N in (2).

Truncated distributions: Let \inputRV be a random vector of dimension \inputDim which PDF is p_\inputRV. Let \cD be a domain of \Rset^\inputDim and let \inputRV_T = \inputRV|\inputRV\in \cD be the random vector \inputRV truncated to the domain \cD. It has the following PDF:

p_{\inputRV_T}(\vect{x}) = \dfrac{1}{\alpha} p_{\inputRV}(\vect{x})  1_{\cD}(\vect{x})

where \alpha = \Prob{\inputRV\in \cD}. Let \vect{x}_\cI be in the support of the margin \cI of \inputRV_T, denoted by \inputRV_{T, \cI}. We denote by \vect{Z} the conditional random vector:

\vect{Z} = \inputRV_{T,\overline{\cI}} | \inputRV_{T, \cI} = \vect{x}_\cI

The random vector \vect{Z} is defined on the domain:

\cD_{\overline{\cI}} = \{ \vect{x}_{\overline{\cI}} \, |\, (\vect{x}_{\overline{\cI}}, \vect{x}_{\cI}) \in \cD \}

The domain \cD_{\overline{\cI}} \neq \emptyset as \vect{x}_\cI \in \supp{\inputRV_{\cI}}. Then, for all \vect{x}_{\overline{\cI}}  \in \cD_{\overline{\cI}}, we have:

p_{\vect{Z}}( \vect{x}_{\overline{\cI}}) & = \dfrac{p_{\inputRV_T}(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI})}{p_{\inputRV_{T,\cI}}(\vect{x}_{\cI})}
1_{\cD_{\overline{\cI}}}(\vect{x}_{\overline{\cI}}) \\
                                         &  \dfrac{1}{\alpha\, p_{\inputRV_{T,\cI}}(\vect{x}_{\cI})} p_{\inputRV}(\vect{x}
                                         _{\overline{\cI}}, \vect{x}_{\cI}))  1_{\cD}(\vect{x}_{\overline{\cI}}, \vect{x}
                                         _{\cI}) 1_{\cD_{\overline{\cI}}}(\vect{x}_{\overline{\cI}})\\
                                         &  \dfrac{1}{\alpha\, p_{\inputRV_{T,\cI}}(\vect{x}_{\cI})} p_{\inputRV}(\vect{x}
                                         _{\overline{\cI}}, \vect{x}_{\cI}) 1_{\cD}(\vect{x}_{\overline{\cI}}, \vect{x}
                                         _{\cI})

which is:

(3)p_{\vect{Z}}( \vect{x}_{\overline{\cI}}) \propto p_{\inputRV}(\vect{x}
                                         _{\overline{\cI}}, \vect{x}_{\cI}) 1_{\cD}(\vect{x}_{\overline{\cI}}, \vect{x}
                                         _{\cI})

Now, we denote by \vect{Y} the conditional random vector:

\vect{Y} = \inputRV_{\overline{\cI}} | \inputRV_{\cI} = \vect{x}_\cI

Then, we have:

p_{\vect{Y}}(\vect{x}_{\overline{\cI}})  = \dfrac{p_{\inputRV}(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI})}{p_{\inputRV_\cI}(\vect{x}_{\cI})}

Let \vect{T} the truncated random vector defined by:

\vect{T} = \vect{Y} | \vect{Y} \in \cD_{\overline{\cI}}

Then, we have:

p_{\vect{T}}(\vect{x}_{\overline{\cI}})  = \dfrac{1}{\beta} p_{\vect{Y}}(\vect{x}_{\overline{\cI}})1_{\cD_{\overline{\cI}}}(\vect{x}_{\overline{\cI}})

where \beta = \Prob{\vect{T} \in \cD_{\overline{\cI}}}. Noting that:

p_{\vect{Y}}(\vect{x}_{\overline{\cI}})1_{\cD_{\overline{\cI}}}(\vect{x}_{\overline{\cI}}) = \dfrac{p_{\inputRV}(\vect{x}_{\overline{\cI}}, \vect{x}
_{\cI})}{p_{\inputRV_\cI}(\vect{x}_{\cI})}1_{\cD}(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI}),

we get:

p_{\vect{T}}(\vect{x}_{\overline{\cI}})  = \dfrac{1}{\beta p_{\inputRV_\cI}(\vect{x}_{\cI})} p_{\inputRV}(\vect{x}_{\overline{\cI}}, \vect{x}
_{\cI})1_{\cD}(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI})

which is:

(4)p_{\vect{T}}(\vect{x}_{\overline{\cI}})  \propto  p_{\inputRV}(\vect{x}_{\overline{\cI}}, \vect{x}_{\cI})1_{\cD}(\vect{x}_{\overline{\cI}}, \vect{x}
_{\cI})

The equivalence of the relations (3) and (4) proves the conclusion.

Conclusion: The conditional distribution of a truncated distribution is the truncated distribution of the conditional distribution. Care: the truncation domains are not exactly the same.

The following figure illustrates the case where (X_0, X_1) \sim \cN \left(\vect{0}, \vect{1}, \mat{R}  \right) with R(0,1) = 0.8. We plot:

  • the PDF of \inputRV|\inputRV\in [-0.5, 1.0] conditioned by X_0 = 0.5 (Cond dist of truncated),

  • the PDF of the truncation to [-0.5, 1.0] of \inputRV|X_0 = 0.5: (Truncation of cond dist).

../../_images/illustration_conditional_truncated.png

Note that the numerical range of the conditional distribution might be different from the range of the numerical range of the non conditioned distribution. For example, consider a bivariate distribution (X_0, X_1) following a normal distribution with zero mean, unit variance and a correlation R(0,1) = 0.4. Then consider X_1|X_0 = 10.0. The numerical range of X_1|X_0 = 10 is [-3.01, 11.0] where as the numerical range of X_1 is [-7.65, 7.65]. See Create a Point Conditional Distribution to get some more examples.

The computation of the numerical range is important to make possible the integration of the PDF on some domains. The library implements 3 strategies to compute it. We detail these strategies.

Strategy None: The numerical range of \inputRV_{\overline{\cI}}|\inputRV_\cI = \vect{x}_\cI is the same as the numerical range of \inputRV_{\overline{\cI}}. This range is exact for all distributions with bounded support. For distributions with unbounded support, it is potentially false when the conditional values are very close to the bounds of the initial numerical support.

Strategy Normal: Let \vect{Y} be the Gaussian vector of dimension \inputDim, which mean vector is defined by \vect{\mu} = \Expect{\inputRV} and covariance matrix is defined by \mat{C} = \Cov{\inputRV}. Then, we build the conditioned Gaussian vector:

\vect{Y}_{\overline{\cI}}|\vect{Y}_\cI = \vect{x}_\cI

The numerical range \cD \left(\vect{Y}_{\overline{\cI}}|\vect{Y}_\cI = \vect{x}_\cI \right) of \vect{Y}_{\overline{\cI}}|\vect{Y}_\cI = \vect{x}_\cI is known exactly thanks to the simplification mechanism implemented for Gaussian vectors. We assign to \inputRV_{\overline{\cI}}|\inputRV_\cI = \vect{x}_\cI the range \cD \left(\vect{Y}_{\overline{\cI}}|\vect{Y}_\cI = \vect{x}_\cI \right):

\cD\left(\inputRV_{\overline{\cI}}|\inputRV_\cI = \vect{x}_\cI \right) = \cD \left(\vect{Y}_{\overline{\cI}}|\vect{Y}_\cI
= \vect{x}_\cI \right)

Strategy NormalCopula: Let \vect{Y} be the Gaussian vector of dimension \inputDim, with zero mean, unit variance and which correlation matrix \mat{R} is defined from the Spearman correlation matrix of \inputRV: \left( \rho_S(X_i, X_j) \right)_{1 \leq i, j \leq \inputDim}. Thus, \vect{Y} is the standard representant of the normal copula having the same correlation as \inputRV.

For each conditioning value x_i, we define the quantile q_i of the normal distribution with zero mean and unit variance associated to the same order as x_i, for i \in \cI:

q_i & = \Phi^{-1} \circ F_i \left (x_i \right)

where \Phi is the CDF of the normal distribution with zero mean and unit variance. Then, we build the conditioned Gaussian vector:

\vect{Y}_{\overline{\cI}}|\vect{Y}_\cI = \vect{q}_\cI

which numerical range \cD\left(\vect{Y}_{\overline{\cI}}|\vect{Y}_\cI = \vect{q}_\cI \right) can be exactly computed. Let it be:

\cD\left(\vect{Y}_{\overline{\cI}}|\vect{Y}_\cI = \vect{q}_\cI \right) = \prod_{i\in \overline{\cI}}
\left[ y_i^{min}, y_i^{max}\right]

Then, inversely, we compute the quantiles of each F_i for i \in \cI which have the same order as the bounds y_i^{min} and y_i^{max} with respect \Phi:

x_i^{min} & = F_i^{-1}\circ \Phi \left (y_i^{min} \right) \\
x_i^{max} & = F_i^{-1}\circ \Phi \left (y_i^{max} \right)

We assign to \inputRV_{\overline{\cI}}|\inputRV_\cI = \vect{x}_\cI the numerical range defined by:

\cD \left(\vect{X}_{\overline{\cI}}|\vect{X}_\cI = \vect{x}_\cI \right) = \prod_{i\in \overline{\cI}} \left[ x_i^{min},
x_i^{max}\right]

Case 3: Create a distribution whose parameters are random

The objective is to create the marginal distribution of \inputRV in Case 1.

See the class DeconditionedDistribution.

This class requires the following features:

  • the \inputRV may be continuous, discrete or neither: e.g., it can be a Mixture of discrete and continuous distributions. In that case, its parameters set is the union of the parameters set of each of its atoms (the weights of the mixture are not considered as parameters).

  • each component Y_i is continuous or discrete: e.g., it can not be a Mixture of discrete and continuous distributions, (so that the random vector \vect{Y} may have some discrete components and some continuous components),

  • the copula of \vect{Y} is continuous: e.g., it can not be the MinCopula,

  • if \vect{Y} has both discrete components and continuous components, its copula must be the independent copula. The general case has not been implemented yet.

We define:

p_{\vect{Y}}(\vect{y}) = \left( \prod_{i=1}^\inputDim p_i(y_i) \right) c(F_1(x_1), \dots, F_d(x_d))

where:

  • c is the probability density copula of \vect{Y},

  • if Y_i is a continuous component, p_i is its probability density function,

  • if Y_i is a discrete component, p_i = \sum_{y^i_k \in \cS^i} \Prob{Y_i = y^i_k} \delta_{y^i_k} where \cS^i = \{ y^i_k \} is its support and \delta_{y^i_k} the Dirac distribution centered on y^i_k.

Then, the PDF of \inputRV is defined by:

p_{\vect{X}}(\vect{x}) = \int p_{\vect{X}|\vect{\Theta}=g(\vect{y})}(\vect{x}|g(\vect{y})) p_{\vect{Y}}(\vect{y})\di{\vect{y}}

with the same convention as for \vect{Y}.

Note that this is always possible to create the random vector \inputRV whatever the distribution of \vect{\Theta}: see the class DeconditionedRandomVector. But remember that a DeconditionedRandomVector (and more generally a RandomVector) can only be sampled.

Case 4: Create a Bayesian posterior distribution

Consider the random vector \vect{X} where \vect{X}|\vect{\Theta} follows the distribution \mathcal{L}_{\vect{X}|\vect{\Theta}}, with \vect{\Theta} = g(\vect{Y}) and \vect{Y} following the prior distribution \mathcal{L}_{\vect{Y}}. The function g is a link function which input dimension is the dimension of \mathcal{L}_{\vect{Y}} and which output dimension the dimension of \vect{\Theta}.

The objective is to create the posterior distribution of \vect{Y} given that we have a sample (\vect{x}_1, \dots, \vect{x}_\sampleSize) of \vect{X}.

See the class PosteriorDistribution.

This class requires the following features:

  • the \inputRV may be continuous, discrete or neither: e.g., it can be a Mixture of discrete and continuous distributions. In that case, its parameters set is the union of the parameters set of each of its atoms (the weights of the mixture are not considered as parameters).

  • each component Y_i is continuous or discrete: e.g., it can not be a Mixture of discrete and continuous distributions, (the random vector \vect{Y} may have some discrete components and some continuous components),

  • the copula of \vect{Y} is continuous: e.g., it can not be the MinCopula.

If \vect{Y} and \vect{X} are continuous random vector, then the posterior PDF of \vect{Y} is defined by:

(5)f_{\vect{Y}|\inputRV_1 = \vect{x}_1, \dots, \inputRV_\sampleSize =  \vect{x}_\sampleSize}(\vect{y}) = \frac{f_{\vect{Y}}(\vect{y})
\prod_{i=1}^\sampleSize f_{\inputRV|\vect{\theta} = g(\vect{y})}(\vect{x}_i)}{\int f_{\vect{Y}}(\vect{y})\prod_{i=1}^\sampleSize
f_{\inputRV|\vect{\theta} = g(\vect{y})}(\vect{x}_i) d \vect{y}}

with f_{\inputRV|\vect{\theta} = g(\vect{y})} the PDF of the distribution of \inputRV|\vect{\Theta} where \vect{\Theta} has been replaced by g(\vect{y}) and f_{\vect{Y}} the PDF of the prior distribution of \vect{Y}.

Note that the denominator of (5) is the PDF of the deconditioned distribution of \inputRV|\vect{\Theta}=g(\vect{Y}) with respect to the prior distribution of \vect{Y}.

In the other cases, the PDF is the probability distribution function for the discrete components and the \int are replaced by some \sum.