Optimal LHS design¶

Let $\inputRV \in \Rset^{\inputDim}$ be a random vector of input parameters. Latin Hypercube Sample (LHS) is a way to distribute N sample points: each parameter range is divided into N equal intervals, and sample points are chosen such that any hyperplane in that dimension contains one and only one sample point.

The goal of this module is to improve standard LHS techniques by minimizing a space filling criterion.

Principles¶

We may notice two types of LHS designs:

Centered design is obtained by choosing for each point the center of the corresponding cell
Randomized LHS is obtained by adding random perturbations inside each cell

Let the input vector $\inputRV$ whose marginals are independent and associated probabilistic measure $\mathcal{L}_{\inputRV}$

In practice, we look for a design in the space $[0,1]^{\inputDim}$ and we use an inverse iso-probabilistic transformation to get the result in the original domain.

Let $\phi : [0,1]^{\inputDim} \rightarrow \mathbb{R}^{+}$ be a a space filling criterion, which is a measure of accuracy of an optimal LHS design. Most of these criteria focus on discrepancy, which measures how far a given distribution of points deviates from a perfectly uniform one. We denote by $(\vect{x}^1, \dots, \vect{x}^\sampleSize)$ the experiment design with $\sampleSize$ points where the $i$ -th point is $\vect{x}^i = (x_1^i, \dots, x_{\inputDim}^i)$ .`

Two space filling criteria are implemented:

The centered $L^2$ -discrepancy, called $C_2$ and given by:

$C_2((\vect{x}^1, \dots, \vect{x}^\sampleSize))^2 = \left(\frac{13}{12}\right)^{\inputDim} - \frac{2}{\sampleSize} \sum_{i=1}^{\sampleSize} \prod_{k=1}^{\inputDim} \left( 1 + \frac{1}{2} |x_k^{(i)} - 0.5| - \frac{1}{2} |x_k^{(i)} - 0.5|^2 \right)\\ + \frac{1}{\sampleSize^2} \sum_{i,j=1}^{N} \prod_{k=1}^{\inputDim} \left( 1 + \frac{1}{2} |x_k^{(i)} - 0.5| + \frac{1}{2} |x_k^{(j)} - 0.5| - \frac{1}{2} |x_k^{(i)} - x_k^{(j)}| \right)$

This discrepancy is to be minimized to get an optimal design.

The mindist criterion (minimal distance between two points in the design):

$\phi((\vect{x}^1, \dots, \vect{x}^\sampleSize)) = min_{} ||x^{(i)} - x^{(j)} ||_{L^2}, \forall i\neq j=1,\cdots \sampleSize$

This criterion is to be maximized.

In practice, the $\phi_p$ criterion is used instead of mindist and writes:

$\phi_p((\vect{x}^1, \dots, \vect{x}^\sampleSize)) = \left( \sum_{1\leq i < j \leq \sampleSize} ||x^{(i)} - x^{(j)}||^{-p}_{L^2} \right)^{\frac{1}{p}}$

This is supposed to be more robust. When p tends to infinity, optimizing a design with $\phi_p$ is equivalent to optimizing a design with mindist. This criterion is to be minimized to get an optimal design.

The objective is to generate an LHS design $(\vect{x}^1, \dots, \vect{x}^\sampleSize)$ that minimizes a space filling criterion $\phi$ (or maximizes mindist). For that purpose, two techniques are implemented and presented hereafter.

Monte Carlo¶

This problem can be approximated by a Monte Carlo algorithm: a fixed number of designs are generated, and the optimal one is kept. This algorithm is trivial and available in MonteCarloLHS. One of the major drawbacks of Monte Carlo sampling is the CPU time consumption, because the number of generated designs must be high.

Simulated Annealing¶

An alternate solution is to use an adapted simulated annealing method, available in SimulatedAnnealingLHS, which we will now describe. Starting from an LHS design, a new design is obtained by permuting one random coordinate of two randomly chosen elements; by construction, this design is also an LHS design. If the new design is better than the previous one, it is kept. If it is worse, it may anyway be kept with some probability, which depends on how these designs compare, but also on a temperature profile T which decreases over time. This means that jumping away from local extrema becomes less probable over time.

It is important to highlight here that this specific permutation has been chosen in this algorithm because it allows highly efficient computations of the criterion during the simulated annealing process. The naive criterion evaluation, as is done in Monte Carlo algorithm, has a complexity of $\mathcal{O}(d\times \sampleSize^2)$ for $C_2$ and $\phi_p$ criteria.

Let us first illustrate with the $C_2$ criterion. We set $z_{ik}=x_{ik}-0.5$ , equation rewrites:

$C_2(X_{d}^\sampleSize)^2 = \left(\frac{13}{12}\right)^{d} +\sum_{i=1}^{\sampleSize}\sum_{j=1}^{\sampleSize} c_{ij}$

with:

(1)¶ $c_{ij}= \,\,\,\left \{ \begin{aligned} &\frac{1}{\sampleSize^2}\prod_{k=1}^{d}\frac{1}{2}(2+|z_{ik}|+|z_{jk}|-|z_{ik}-z_{jk}|)\,\,\,\, \textrm{if}\,\, i\neq j \\ &\frac{1}{\sampleSize^2}\prod_{k=1}^{d}(1+|z_{ik}|)-\frac{2} {\sampleSize}\prod_{k=1}^{d}(1+\frac{1}{2}|z_{ik}|-\frac{1}{2}z_{ik}^2) \,\,\,\,\textrm{otherwise} \\ \end{aligned} \right.$

We set $c^{\prime}$ the elements of a new design $(\vect{x \prime}^1, \dots, \vect{x \prime}^\sampleSize)$ obtained by permuting a coordinate of sample points $i_1$ and $i_2$ . We can see that

(2)¶ $\left \{ \begin{aligned} & c^{\prime}_{ij}=c_{ij} \;\forall i, j \text{ such that } 1\leq i,j\leq \sampleSize,\, i\notin \{i_1,i_2\},\, j\notin \{i_1,i_2\}\\ & c^{\prime}_{i_1i_2}=c_{i_1i_2}\\ & c_{ij}=c_{ji} \;\forall 1\leq i,j\leq \sampleSize \end{aligned} \right.$

and thus, $C_2(X')$ becomes:

$C_2(X^{\prime \sampleSize}_{d})^2 = C_2(X^N_d)^2 + c^{\prime}_{i_1i_1} + c^{\prime}_{i_2i_2} + 2\sum_{\substack{1\leq j\leq \sampleSize\\j\neq i_1,i_2}} (c^{\prime}_{i_1j}+c^{\prime}_{i_2j})\\ {} - c_{i_1i_1} - c_{i_2i_2} - 2\sum_{\substack{1\leq j\leq \sampleSize\\j\neq i_1,i_2}} (c_{i_1j}+c_{i_2j})$

Updating $C_2$ criterion can be performed by a $\mathcal{O}(\sampleSize)$ algorithm, which has a much better complexity than a naive computation.

The same trick can also be applied on $\phi_p$ criterion, because we can write

$\phi_p(X)^p = \sum_{1\leq i < j \leq \sampleSize} ||x^{(i)} - x^{(j)}||^{-p}_{L^2} = \frac{1}{2} \sum_{i=1}^\sampleSize \sum_{\substack{1\leq j\leq \sampleSize\\j\neq i}} ||x^{(i)} - x^{(j)}||^{-p}_{L^2} = \sum_{i=1}^\sampleSize \sum_{j=1}^\sampleSize f_{ij}$

with

$f_{ij}= \,\,\,\left \{ \begin{aligned} & \frac{||x^{(i)} - x^{(j)}||^{-p}_{L^2}}{2}, & i \neq j\\ & 0, & i=j \end{aligned} \right.$

These $f_{ij}$ coefficients satisfy the same conditions, so the same computations give:

$\phi_p(X_{d}^{\prime \sampleSize})^p = \phi_p(X_{d}^\sampleSize)^p + 2\sum_{\substack{1\leq j\leq \sampleSize\\j\neq i_1,i_2}} (f^{\prime}_{i_1j}+f^{\prime}_{i_2j}) - 2\sum_{\substack{1\leq j\leq \sampleSize\\j\neq i_1,i_2}} (f_{i_1j}+f_{i_2j})$

In practice, a marginal transformation is performed to map the initial multivariate distribution into $[0,1]^{\inputDim}$ . Optimization is performed in $[0,1]^{\inputDim}$ and the inverse transformation maps the design into the initial space.

OpenTURNS

An Open source initiative for the Treatment of Uncertainties, Risks'N Statistics

Table of Contents

Previous topic

Next topic

This Page

Optimal LHS design¶

Principles¶

Monte Carlo¶

Simulated Annealing¶