LOLAVoronoi

class LOLAVoronoi(*args)

LOLA-Voronoi sequential design algorithm.

Warning

This class is experimental and likely to be modified in future releases. To use it, import the openturns.experimental submodule.

This class implements the design of experiments algorithm described in [crombecq2011] which allows one to sequentially generate new input samples according to the input measure and model output. It combines an exploration criterion based on Voronoi tesselation to identify undersampled input design regions and an exploitation criterion based on a measure of the nonlinearity of the model.

Lets define a random vector \vect{X} with measure \inputMeasure(\vect{x}) and dimension d, \sampleSize initial points \vect{p}_i, \forall i \in [1,\sampleSize] and a model \model.

The exploration criterion score of the point \vect{p}_i is equal to the probability of its Voronoi cell, which can be written as:

v(\vect{p}_i)
&= \int_{\vect{x} \in \mathcal{V}_i} \inputMeasure(\vect{x}) d\vect{x} \\
&= \Expect{1_{\mathcal{V}_i}(\vect{X})}

for i \in\{1, ..., \sampleSize\} where \sampleSize is the sample size and 1_{\mathcal{V}_i} is the indicator function of the i-th Voronoi cell:

1_{\mathcal{V}_i}(\vect{x}) = 
\begin{cases}
1 & \textrm{ if } \vect{x} \in \mathcal{V}_i \\
0 & \textrm{otherwise.}
\end{cases}

The Voronoi score is approximated using a Monte Carlo experiment.

For the nonlinearity criterion the algorithm only uses the data available and chooses a set of neighbouring samples N(\vect{p}_i) = \{\vect{p}_{i1}, \hdots, \vect{p}_{im}\} with m = 2 d to build the gradient approximation of the model \model at the point \vect{p}_i, denoted by \mat{J}_i.

The LOLA algorithm local nonlinearity score of the point \vect{p}_{r} is the cumulated error between the model and the linear approximation built from its neighbourhood. For each component \model_k of the model, the local nonlinearity score is given by:

e_k(\vect{p}_r) = \sum_{i=1}^m | \model_k(\vect{p}_{ri}) - (\model_k(\vect{p}_r) + \mat{J}_i(\vect{p}_{ri} - \vect{p}_r)) |

for i \in \{1, ..., \sampleSize\} where \model is the model and \model(\vect{p}_{r}) + \mat{J}_i (\vect{p}_{ri} - \vect{p}_{r}) is its linear approximation in the neighbourhood of the point \vect{p}_{r}.

In order to aggregate all the e_k(\vect{p}_r) \forall k \in [1, d_Y], we can use two different norms:

e(\vect{p}_r) = \frac{1}{d_Y} \sum_{k=1}^{d_Y} e_k(\vect{p}_r)

or

e(\vect{p}_r) = \max_{k=1, .. d_Y} e_k(\vect{p}_r)

Now the two metrics are combined into the hybrid score given by:

h(\vect{p}_i) = \lambda v(\vect{p}_i) + (1 - \lambda) \frac{e(\vect{p}_i)}{\sum_{j=1}^n e(\vect{p}_j)}

for i \in \{1, ..., \sampleSize\}, with \lambda a tradeoff coefficient.

To generate a new point, random points are generated by Monte Carlo close to the reference point \vect{p}_i with highest hybrid score. The new point is selected among the generated sample inside the Voronoi cell of \vect{p}_i and farthest away from \vect{p}_i and its neighbours. To generate a batch of n_{new} points this procedure is repeated on the n_{new} reference points with highest hybrid score taking into account the voronoi cells of the new candidates.

Parameters:
x2-d sequence of float

Initial input sample

y2-d sequence of float

Initial output sample

distributionDistribution

Distribution to generate new input samples

Methods

generate(size)

Generate a new sample.

getClassName()

Accessor to the object's name.

getGenerationIndices()

Accessor to the generation indices.

getHybridScore()

Hybrid score accessor.

getInputSample()

Accessor to the input sample.

getLOLAScore()

LOLA score accessor.

getName()

Accessor to the object's name.

getNeighbourhoodCandidatesNumber()

Neighbourhood candidates number accessor.

getOutputSample()

Accessor to the output sample.

getVoronoiSamplingSize()

Voronoi sampling size accessor.

getVoronoiScore()

Voronoi score accessor.

hasName()

Test if the object is named.

setName(name)

Accessor to the object's name.

setNeighbourhoodCandidatesNumber(...)

Neighbourhood candidates number accessor.

setVoronoiSamplingSize(voronoiSamplingSize)

Voronoi sampling size accessor.

update(x, y)

Update the current sample.

Notes

Various ResourceMap entries allow for a more fine-grained control over the algorithm:

  • The entry LOLAVoronoi-DefaultNeighbourhoodCandidatesNumber defines the number of extra closest neighbours around each reference point to consider for selection of the neighbourhood with greatest combined score used for the LOLA criterion.

  • The entry LOLAVoronoi-DefaultVoronoiSamplingSize controls the sampling size used to estimate the Voronoi cells relative volume or generate new candidates.

  • The entry LOLAVoronoi-MaximumCombinationsNumber controls the maximum number of existing point subsets will be tested to get the best neighbourhood for the exploitation criterion.

  • The entry LOLAVoronoi-UseTruncatedDistribution controls the way the \inputMeasure(\vect{x}) distribution is truncated to the bounding box B of the Voronoi cell being sampled. If the flag is set to False, the sampling is done by a rejection method with respect to \inputMeasure(\vect{x}) and B. The advantage is to avoid the expensive computation of the normalization factor of the truncation of \inputMeasure(\vect{x}). The drawback is that the acceptation rate might be low. If the flag is set to True, the truncated distribution of \inputMeasure(\vect{x}) to B is explicitly built and sampled. The advantage is to possibly use the CDF inversion method to sample this truncated distribution: all the points are accepted. But the CDF inversion method is only available if the copula of \inputMeasure(\vect{x}) is the independent one: otherwise a rejection method is used and the advantage of this option is lost. The drawback is the automatic computation of the normalization factor of the truncation of \inputMeasure(\vect{x}) which is used only if the CDF inversion method is used. As a result, the default value is False.

  • The string entry LOLAVoronoi-NonLinearityAggregationMethod controls how the non-linearity score is computed across multiple outputs from the gradient vectors: either Maximum (default) to distinguish the most nonlinear regions or Average that will smooth the score but can be a good alternative when there are only a few outputs.

  • The string entry LOLAVoronoi-DecompositionMethod selects the least-squares method to compute the gradient.

  • The float entry LOLAVoronoi-HybridScoreTradeoff is the value of the tradeoff coefficient \lambda.

Examples

>>> import openturns as ot
>>> import openturns.experimental as otexp
>>> formula = '-4 * exp((-25 / 8) * (a0^2 + a1^2)) + 7 * exp((-125 / 4) * (a0^2 + a1^2))'
>>> f1 = ot.SymbolicFunction(['a0', 'a1'], [formula])
>>> distribution = ot.JointDistribution([ot.Uniform(-1.0, 1.0)] * 2)
>>> x0 = ot.LowDiscrepancyExperiment(ot.HaltonSequence(), distribution, 10).generate()
>>> y0 = f1(x0)
>>> algo = otexp.LOLAVoronoi(x0, y0, distribution)

Now add 2 blocks of 3 points:

>>> for i in range(2):
...     x = algo.generate(3)
...     y = f1(x)
...     algo.update(x, y)
__init__(*args)
generate(size)

Generate a new sample.

Parameters:
sizeint

Size of sample to generate

Returns:
xSample

New input sample.

getClassName()

Accessor to the object’s name.

Returns:
class_namestr

The object class name (object.__class__.__name__).

getGenerationIndices()

Accessor to the generation indices.

Returns:
generationIndicesIndices

Indices of last element for each generation. This is updated each time the update() method is evaluated.

getHybridScore()

Hybrid score accessor.

Returns:
hybridScoreSample

The hybrid score h(p_i) for the current sample.

getInputSample()

Accessor to the input sample.

Returns:
xSample

Input sample

getLOLAScore()

LOLA score accessor.

Returns:
lolaScoreSample

The exploitation score e(p_i) for the current sample. Note that unlike v(p_i) this is scaled in [0, 1].

getName()

Accessor to the object’s name.

Returns:
namestr

The name of the object.

getNeighbourhoodCandidatesNumber()

Neighbourhood candidates number accessor.

Returns:
neighbourhoodCandidatesNumberint

The number of extra closest neighbour around each reference point to consider for selection of the neighbourhood with greatest combined score used for the LOLA criterion. The default value is set to the LOLAVoronoi-DefaultNeighbourhoodCandidatesNumber entry from ResourceMap.

getOutputSample()

Accessor to the output sample.

Returns:
ySample

Input sample

getVoronoiSamplingSize()

Voronoi sampling size accessor.

Returns:
voronoiSamplingSizeint

The sampling size used to estimate the Voronoi cells relative volume or generate new candidates. The default value is set to the LOLAVoronoi-DefaultVoronoiSamplingSize entry from ResourceMap.

getVoronoiScore()

Voronoi score accessor.

Returns:
voronoiScoreSample

The exploration score v(p_i) for the current sample.

hasName()

Test if the object is named.

Returns:
hasNamebool

True if the name is not empty.

setName(name)

Accessor to the object’s name.

Parameters:
namestr

The name of the object.

setNeighbourhoodCandidatesNumber(neighbourhoodCandidatesNumber)

Neighbourhood candidates number accessor.

Parameters:
neighbourhoodCandidatesNumberint

The number of extra closest neighbour around each reference point to consider for selection of the neighbourhood with greatest combined score used for the LOLA criterion. The default value is set to the LOLAVoronoi-DefaultNeighbourhoodCandidatesNumber entry from ResourceMap.

setVoronoiSamplingSize(voronoiSamplingSize)

Voronoi sampling size accessor.

Parameters:
voronoiSamplingSizeint

The sampling size used to estimate the Voronoi cells relative volume or generate new candidates. The default value is set to the LOLAVoronoi-DefaultVoronoiSamplingSize entry from ResourceMap.

update(x, y)

Update the current sample.

Adds a new incremental sample (input and output) to the existing sample.

Parameters:
x, ySample

New incremental sample.

Examples using the class

LOLA-Voronoi sequential design of experiment

LOLA-Voronoi sequential design of experiment