Compare unconditional and conditional histogramsΒΆ

In this example, we compare unconditional and conditional histograms for a simulation. We consider the flooding model. Let be a function which takes four inputs , , and and returns one output .

We first consider the (unconditional) distribution of the input .

Let be a given threshold on the output : we consider the event . Then we consider the conditional distribution of the input given that that is to say .

If these two distributions are significantly different, we conclude that the input has an impact on the event .

In order to approximate the distribution of the output , we perform a Monte-Carlo simulation with size 500. The threshold is chosen as the 90% quantile of the empirical distribution of . In this example, the distribution is aproximated by its empirical histogram (but this could be done with another distribution approximation as well, such as kernel smoothing for example).

```import numpy as np
from openturns.usecases import flood_model
import openturns as ot
import openturns.viewer as viewer
from matplotlib import pylab as plt

ot.Log.Show(ot.Log.NONE)
```

We use the FloodModel data class that contains all the case parameters.

```fm = flood_model.FloodModel()
```

Create an input sample from the joint distribution defined in the data class. We build an output sample by taking the image by the model.

```size = 500
inputSample = fm.distribution.getSample(size)
inputSample[:5]
```
Q (m3/s) Ks Zv (m) Zm (m) B (m) L (m) Zb (m) Hd (m) 2032.978 28.16431 49.81823 54.44882 298.0983 4997.511 55.27675 3.987806 831.1784 32.06598 49.8578 54.29531 298.3157 4997.297 55.18741 2.030507 1741.776 19.36681 49.08975 55.0745 299.1433 4999.432 55.66693 2.719918 800.476 40.00743 49.16216 55.03673 299.5998 5002.712 55.55715 3.080748 917.9835 38.23018 49.19878 54.97124 302.2765 5008.607 55.36659 3.816204

```outputSample = fm.model(inputSample)
outputSample[:5]
```
H S C 3.470401 -5.975926 1.034781 1.900478 -5.459643 1.140406 3.659279 -5.637822 1.102684 1.492342 -7.983398 0.7745734 1.66541 -8.3186 0.7507753

Merge the input and output samples into a single sample.

```sample = ot.Sample(inputSample)
sample.stack(outputSample)
sample[0:5]
```
Q (m3/s) Ks Zv (m) Zm (m) B (m) L (m) Zb (m) Hd (m) H S C 2032.978 28.16431 49.81823 54.44882 298.0983 4997.511 55.27675 3.987806 3.470401 -5.975926 1.034781 831.1784 32.06598 49.8578 54.29531 298.3157 4997.297 55.18741 2.030507 1.900478 -5.459643 1.140406 1741.776 19.36681 49.08975 55.0745 299.1433 4999.432 55.66693 2.719918 3.659279 -5.637822 1.102684 800.476 40.00743 49.16216 55.03673 299.5998 5002.712 55.55715 3.080748 1.492342 -7.983398 0.7745734 917.9835 38.23018 49.19878 54.97124 302.2765 5008.607 55.36659 3.816204 1.66541 -8.3186 0.7507753

Extract the first column of inputSample into the sample of the flowrates .

```sampleQ = inputSample[:, 0]
```

The next cell defines a function that computes the conditional sample of a component given that the a marginal (defined by its index criteriaComponent) exceeds a given threshold, defined by its quantile level.

```def computeConditionnedSample(
sample, alpha=0.9, criteriaComponent=None, selectedComponent=0
):
"""
Return values from the selectedComponent-th component of the sample.
Selects the values according to the alpha-level quantile of
the criteriaComponent-th component of the sample.
"""
dim = sample.getDimension()
if criteriaComponent is None:
criteriaComponent = dim - 1
sortedSample = sample.sortAccordingToAComponent(criteriaComponent)
quantiles = sortedSample.computeQuantilePerComponent(alpha)
quantileValue = quantiles[criteriaComponent]
sortedSampleCriteria = sortedSample[:, criteriaComponent]
indices = np.where(np.array(sortedSampleCriteria.asPoint()) > quantileValue)[0]
conditionnedSortedSample = sortedSample[int(indices[0]) :, selectedComponent]
return conditionnedSortedSample
```

Create an histogram for the unconditional flowrates.

```numberOfBins = 10
histogram = ot.HistogramFactory().buildAsHistogram(sampleQ, numberOfBins)
```

Extract the sub-sample of the input flowrates Q which leads to large values of the output S.

Search the index of the marginal S in the columns of the sample.

```criteriaComponent = list(sample.getDescription()).index("S")
criteriaComponent
```
```9
```
```alpha = 0.9
selectedComponent = 0
conditionnedSampleQ = computeConditionnedSample(
sample, alpha, criteriaComponent, selectedComponent
)
```

We could as well use:

```# conditionnedHistogram = ot.HistogramFactory().buildAsHistogram(conditionnedSampleQ)
```

but this creates an histogram with new classes, corresponding to conditionnedSampleQ. We want to use exactly the same classes as the full sample, so that the two histograms match.

```first = histogram.getFirst()
width = histogram.getWidth()
conditionnedHistogram = ot.HistogramFactory().buildAsHistogram(
conditionnedSampleQ, first, width
)
```

Then creates a graphics with the unconditional and the conditional histograms.

```graph = histogram.drawPDF()
graph.setLegends(["Q"])
#
graphConditionnalQ = conditionnedHistogram.drawPDF()
graphConditionnalQ.setColors(["blue"])
graphConditionnalQ.setLegends([r"\$Q | S > S_{%s}\$" % (alpha)])