Optimal LHS design¶
Let be a random vector of input parameters. Latin Hypercube Sample (LHS) is a way to distribute N sample points: each parameter range is divided into N equal intervals, and sample points are chosen such that any hyperplane in that dimension contains one and only one sample point.
The goal of this module is to improve standard LHS techniques by minimizing a space filling criterion.
We may notice two types of LHS designs:
- Centered design is obtained by choosing for each point the center of the corresponding cell
- Randomized LHS is obtained by adding random perturbations inside each cell
Let us fix the following properties for the input vector X:
- Its marginals are independent
- Its associated probabilistic measure is
with the uniform distribution.
In practice, we look for a design in the space and we use an inverse iso-probabilistic transformation to get the result in the original domain.
Let be a a space filling criterion, which is a measure of accuracy of an optimal LHS design. Most of these criteria focus on discrepancy, which measures how far a given distribution of points deviates from a perfectly uniform one.
Two space filling criteria are implemented:
- The centered -discrepancy, called and given by:
This discrepancy is to be minimized to get an optimal design.
- The mindist criterion (minimal distance between two points in the design):
This criterion is to be maximized.
- In practice, the criterion is used instead of mindist and writes:
This is supposed to be more robust. When p tends to infinity, optimizing a design with is equivalent to optimizing a design with mindist. This criterion is to be minimized to get an optimal design.
The objective is to generate an LHS design that minimizes a space filling criterion (or maximizes mindist). For that purpose, two techniques are implemented and presented hereafter.
This problem can be approximated by a Monte Carlo algorithm: a fixed number of designs are generated, and the optimal one is kept.
This algorithm is trivial and available in
One of the major drawbacks of Monte Carlo sampling is the CPU time consumption, because the number of generated designs must be high.
An alternate solution is to use an adapted simulated annealing method, available in
SimulatedAnnealingLHS, which we will now describe.
Starting from an LHS design, a new design is obtained by permuting one random coordinate of two randomly chosen elements; by construction, this design is also an LHS design.
If the new design is better than the previous one, it is kept.
If it is worse, it may anyway be kept with some probability, which depends on how these designs compare, but also on a temperature profile T which decreases over time.
This means that jumping away from local extrema becomes less probable over time.
It is important to highlight here that this specific permutation has been chosen in this algorithm because it allows highly efficient computations of criterion during simulated annealing process. he naive criterion evaluation, as is done in Monte Carlo algorithm, has a complexity of for and criteria.
Let us first illustrate with the criterion. We set , equation rewrites:
We set the elements of a new design obtained by permuting a coordinate of sample points and . We can see that
and thus, becomes:
Updating criterion can be performed by a algorithm, which has a much better complexity than a naive computation.\
The same trick can also be applied on criterion, because we can write
These coefficients satisfy the same conditions, so the same computations give:
In practice, a marginal transformation is performed to map the initial multivariate distribution into . Optimization is performed in and the inverse transformation maps the design into the initial space.