Random Mixture: affine combination of independent univariate distributions

A multivariate random variable \vect{Y} may be defined as an affine transform of n independent univariate random variable, as follows:

(1)\displaystyle \vect{Y}=\vect{y}_0+\mat{M}\,\vect{X}

where \vect{y}_0\in\mathbb{R}^d is a deterministic vector with d\in\{1,2,3\}, \mat{M}\in\mathcal{M}_{d,n}(\mathbb{R}) a deterministic matrix and (X_k)_{ 1 \leq k \leq n} are some independent univariate distributions.

In such a case, it is possible to evaluate directly the distribution of \vect{Y} and then to ask \vect{Y} any request compatible with a distribution: moments, probability and cumulative density functions, quantiles (in dimension 1 only) …

Evaluation of the probability density function of the Random Mixture

As the univariate random variables X_i are independent, the characteristic function of \vect{Y}, denoted \phi_Y, is easily defined from the characteristic function of X_k denoted \phi_{X_k} as follows :

(2)\displaystyle \phi_Y(u_1,\hdots,u_d)=\prod_{j=1}^de^{iu_j{y_0}_j}\prod_{k=1}^n\phi_{X_k}((M^tu)_k), \mbox{  for } \vect{u}\in\mathbb{R}^d

Once \phi_Y evaluated, it is possible to evaluate the probability density function of Y, denoted p_Y : several techniques are possible, as the inversion of the Fourier transformation. This technique is not easy to implement.
Another technique is used, based on the Poisson sum formulation, defined as follows:

(3)\displaystyle \sum_{j_1\in\mathbb{Z}}\hdots\sum_{j_d\in\mathbb{Z}} p_Y\left(y_1+\frac{2\pi j_1}{h_1},\hdots,y_d+\frac{2\pi j_d}{h_d}\right)=
     \prod_{j=1}^d \frac{h_j}{2*\pi}\sum_{k_1\in\mathbb{Z}}\hdots\sum_{k_d\in\mathbb{Z}}\phi\left(k_1h_1,\hdots,k_dh_d\right)e^{-\imath(\sum_{m=1}^{d}k_m h_m y_m)}

By fixing h_1,\hdots,h_d small enough, \frac{2k\pi}{h_j} \approx +\infty and p_Y(\hdots,\frac{2k\pi}{h_j},\hdots) \approx 0 because of the decreasing properties of p_Y. Thus the nested sums of the left term of (3) are reduced to the central term j_1=\hdots=j_d = 0: the left term is approximatively equal to p_Y(y).
Furthermore, the right term of (3) is a series which converges very fast: only few terms of the series are enough to get machine-precision accuracy. Let us note that the factors \phi_Y(k_1 h_1,\hdots,k_d,h_d), which are expensive to evaluate, do not depend on y and are evaluated once only.
It is also possible to greatly improve the performance of the algorithm by noticing that equation  is linear between p_Y and \phi_Y. We denote q_Y and \psi_Y respectively the density and the characteristic function of the multivariate normal distribution with the same mean \vect{\mu} and same covariance matrix \vect{C} as the random mixture. By applying this multivariate normal distribution to the equation , we obtain by subtraction:

(4)\displaystyle  p_Y\left(y\right) = \sum_{j\in\mathbb{Z}^d} q_Y\left(y_1+\frac{2\pi j_1}{h_1},\cdots,y_d+\frac{2\pi j_d}{h_d}\right)+
   \frac{H}{2^d\pi^d}\sum_{|k_1|\leq N}\cdots\sum_{|k_d|\leq N} \delta_Y\left(k_1h_1,\cdots,k_dh_d\right)e^{-\imath(\sum_{m=1}^{d}k_m h_m y_m)}

where H = h_1\times\cdots\times h_d, j=(j_1,\cdots,j_d), \delta_Y:=\phi_Y - \psi_Y

In the case where n \gg 1, using the limit central theorem, the law of \vect{Y} tends to the normal distribution density q, which will drastically reduce N. The sum on q will become the most CPU-intensive part, because in the general case we will have to keep more terms than the central one in this sum, since the parameters h_1, \dots  h_d were calibrated with respect to p and not q.

The parameters h_1, \dots  h_d are calibrated using the following formula:

h_\ell = \frac{2\pi}{(\beta+4\alpha)\sigma_\ell}

where \sigma_\ell=\sqrt{\Cov{\vect{Y}}_{\ell,\ell}} and \alpha, \beta are respectively the number of standard deviations covered by the marginal distribution (\alpha=5 by default) and \beta the number of marginal deviations beyond which the density is negligible (\beta=8.5 by default).

The N parameter is dynamically calibrated: we start with N=8 then we double N value until the total contribution of the additional terms is negligible.

Evaluation of the moments of the Random Mixture

The relation (1) enables to evaluate all the moments of the random mixture, if mathematically defined. For example, we have:

  \Expect{\vect{Y}} & = & \vect{y_0} + \mat{M}\Expect{\vect{X}} \\
  \Cov{\vect{Y}} & = & \mat{M}\,\Cov{\vect{X}}\mat{M}^t

Computation on a regular grid

The interest is to compute the density function on a regular grid. Purposes are to get an approximation quickly. The regular grid is of form:

    \forall r\in\{1,\hdots,d\},\forall m\in\{0,\hdots,M-1\},\:y_{r,m}=\mu_r+b\left(\frac{2m+1}{M} - 1\right)\sigma_r

By denoting p_{m_1,\hdots,m_d}=p_{\vect{Y}}(y_{1,m_1},\hdots,y_{d,m_d}):

    p_{m_1,\hdots,m_d}= Q_{m_1,\hdots,m_d}+S_{m_1,\hdots,m_d}

for which the term S_{m_1,\hdots,m_d} is the most CPU consuming. This term rewrites:

  E_{m_1,\hdots,m_d}(k_1,\hdots,k_d) \label{Eq:S}


    E_{m_1,\hdots,m_d}(k_1,\hdots,k_d)&=e^{-i\sum_{j=1}^d k_jh_j\left(\mu_j+b\left(\frac{2m_j+1}{M}-1\right)\sigma_j\right)}

The aim is to rewrite the previous expression as a d- discrete Fourier transform, in order to apply Fast Fourier Transform (FFT) for its evaluation.

We set M=N and \forall j \in\{1,\hdots,d\},\: h_j=\frac{\pi}{b\sigma_j} and \tau_j=\frac{\mu_j}{b\sigma_j}. For convenience, we introduce the functions:

f_j(k) = e^{-i\pi (k+1)\left(\tau_j-1+\frac{1}{N}\right)}

We use k+1 instead of k in this function to simplify expressions below.

We obtain:

  E_{m_1,\hdots,m_d}(k_1,\hdots,k_d)&=e^{-i\sum_{j=1}^{d} k_jh_jb\sigma_j\left(\frac{\mu_j}{b\sigma_j}+\frac{2m_j}{N}+\frac{1}{N}-1\right)}\notag\\
    &=e^{-2i\pi\left(\frac{\sum_{j=1}^{d}k_j m_j}{N}\right)}e^{-i\pi\sum_{j=1}^{d} k_j\left(\tau_j-1+\frac{1}{N}\right)} \notag\\
    &=e^{-2i\pi\left(\frac{\sum_{j=1}^{d}k_j m_j}{N}\right)} f_1(k_1-1) \times\hdots\times f_d(k_d-1) \label{Eq:E}

For performance reasons, we want to use the discrete Fourier transform with the following convention in dimension 1:

A_m = \sum_{k=0}^{N-1} a_k e^{-2i\pi\frac{km}{N}}

which extension to dimensions 2 and 3 are respectively:

A_{m,n} = \sum_{k=0}^{N-1}\sum_{l=0}^{N-1} a_{k,l} e^{-2i\pi\frac{km}{N}} e^{-2i\pi\frac{ln}{N}}\\

A_{m,n,p} = \sum_{k=0}^{N-1}\sum_{l=0}^{N-1}\sum_{s=0}^{N-1} a_{k,l,s} e^{-2i\pi\frac{km}{N}} e^{-2i\pi\frac{ln}{N}} e^{-2i\pi\frac{sp}{N}}

We decompose sums of  on the interval [-N,N] into three parts:

    \sum_{k_j=-N}^{N}\delta\left(k_1h_1,\hdots,k_dh_d\right) E_{m_1,\hdots,m_d}(k_1,\hdots,k_d)
      = & \sum_{k_j=-N}^{-1} \delta\left(k_1h_1,\hdots,k_dh_d\right) E_{m_1,\hdots,m_d}(k_1,\hdots,k_d) \notag\\
      & + \delta\left(k_1h_1,\hdots,0,\hdots,k_dh_d\right) E_{m_1,\hdots,0,\hdots,m_d}(k_1,\hdots,0,\hdots,k_d) \notag\\
      & + \sum_{k_j=1}^{N}\delta\left(k_1h_1,\hdots,k_dh_d\right) E_{m_1,\hdots,m_d}(k_1,\hdots,k_d)

If we already computed E for dimension d-1, then the middle term in this sum is trivial.

To compute the last sum of equation, we apply a change of variable k_j'=k_j-1:

  \sum_{k_j=1}^{N}\delta\left(k_1h_1,\hdots,k_dh_d\right) E_{m_1,\hdots,m_d}(k_1,\hdots,k_d)
  = & \sum_{k_j=0}^{N-1}\delta\left(k_1h_1,\hdots,(k_j+1)h_j,\hdots,k_dh_d\right) \times\notag\\
    & \hspace*{3cm} E_{m_1,\hdots,m_d}(k_1,\hdots,k_j+1,\hdots,k_d)

Equation gives:

      e^{-2i\pi\left(\frac{\sum_{l=1}^{d}k_l m_l}{N} +\frac{m_j}{N}\right)}
      f_1(k_1-1)\times\hdots\times f_j(k_j)\times\hdots\times f_d(k_d-1)\notag\\
      e^{-2i\pi\left(\frac{\sum_{l=1}^{d}k_l m_l}{N}\right)}
      f_1(k_1-1)\times\hdots\times f_j(k_j)\times\hdots\times f_d(k_d-1)


  \sum_{k_j=1}^{N}\delta\left(k_1h_1,\hdots,k_dh_d\right) E_{m_1,\hdots,m_d}&(k_1,\hdots,k_d)
    = e^{-2i\pi\left(\frac{m_j}{N}\right)} \sum_{k_j=0}^{N-1}\delta\left(k_1h_1,\hdots,(k_j+1)h_j,\hdots,k_dh_d\right) \times\notag\\
    & e^{-2i\pi\left(\frac{\sum_{l=1}^{d}k_l m_l}{N}\right)}
      f_1(k_1-1)\times\hdots\times f_j(k_j)\times\hdots\times f_d(k_d-1) \label{Eq:j-sigma+}

To compute the first sum of equation, we apply a change of variable k_j'=N+k_j:

  \sum_{k_j=-N}^{-1}\delta\left(k_1h_1,\hdots,k_dh_d\right) E_{m_1,\hdots,m_d}(k_1,\hdots,k_d)
  = & \sum_{k_j=0}^{N-1}\delta\left(k_1h_1,\hdots,(k_j-N)h_j,\hdots,k_dh_d\right) \times\notag\\
    & \hspace*{3cm} E_{m_1,\hdots,m_d}(k_1,\hdots,k_j-N,\hdots,k_d)

Equation  gives:

      e^{-2i\pi\left(\frac{\sum_{l=1}^{d}k_l m_l}{N} -m_j\right)}
      f_1(k_1-1)\times\hdots\times f_j(k_j-1-N)\times\hdots\times f_d(k_d-1) \notag\\
      e^{-2i\pi\left(\frac{\sum_{l=1}^{d}k_l m_l}{N}\right)}
      f_1(k_1-1)\times\hdots\times \overline{f}_j(N-1-k_j)\times\hdots\times f_d(k_d-1)


  \sum_{k_j=-N}^{-1}\delta\left(k_1h_1,\hdots,k_dh_d\right) E_{m_1,\hdots,m_d}&(k_1,\hdots,k_d)
    = \sum_{k_j=0}^{N-1}\delta\left(k_1h_1,\hdots,(k_j-N)h_j,\hdots,k_dh_d\right) \times\notag\\
    & e^{-2i\pi\left(\frac{\sum_{l=1}^{d}k_l m_l}{N}\right)}
      f_1(k_1-1)\times\hdots\times \overline{f}_j(N-1-k_j)\times\hdots\times f_d(k_d-1) \label{Eq:j-sigma-}

To summarize:

  1. In order to compute sum from k_1=1 to N, we multiply by e^{-2i\pi\left(\frac{m_1}{N}\right)} and consider \delta((k_1+1)h,\hdots)f_1(k_1)

  2. In order to compute sum from k_1=-N to -1, we consider \delta((k_1-N)h,\hdots)\overline{f}_1(N-1-k_1)