Compare unconditional and conditional histogramsΒΆ

In this example, we compare unconditional and conditional histograms for a simulation. We consider the flooding model. Let g be a function which takes four inputs Q, K_s, Z_v and Z_m and returns one output H.

We first consider the (unconditional) distribution of the input Q.

Let t be a given threshold on the output H: we consider the event H>t. Then we consider the conditional distribution of the input Q given that H>t : Q|H>t.

If these two distributions are significantly different, we conclude that the input Q has an impact on the event H>t.

In order to approximate the distribution of the output H, we perform a Monte-Carlo simulation with size 500. The threshold t is chosen as the 90% quantile of the empirical distribution of H. In this example, the distribution is aproximated by its empirical histogram (but this could be done with another distribution approximation as well, such as kernel smoothing for example).

[1]:
import openturns as ot

Create the marginal distributions of the parameters.

[2]:
dist_Q = ot.TruncatedDistribution(ot.Gumbel(558., 1013.), 0, ot.TruncatedDistribution.LOWER)
dist_Ks = ot.TruncatedDistribution(ot.Normal(30.0, 7.5), 0, ot.TruncatedDistribution.LOWER)
dist_Zv = ot.Uniform(49.0, 51.0)
dist_Zm = ot.Uniform(54.0, 56.0)
marginals = [dist_Q, dist_Ks, dist_Zv, dist_Zm]

Create the joint probability distribution.

[3]:
distribution = ot.ComposedDistribution(marginals)
distribution.setDescription(['Q', 'Ks', 'Zv', 'Zm'])

Create the model.

[4]:
model = ot.SymbolicFunction(['Q', 'Ks', 'Zv', 'Zm'],
                            ['(Q/(Ks*300.*sqrt((Zm-Zv)/5000)))^(3.0/5.0)'])

Create a sample.

[5]:
size = 500
inputSample = distribution.getSample(size)
outputSample = model(inputSample)

Merge the input and output samples into a single sample.

[6]:
sample = ot.Sample(size,5)
sample[:,0:4] = inputSample
sample[:,4] = outputSample
sample[0:5,:]
[6]:
v0v1v2v3v4
01443.60279832553230.15661349472527449.1171359507033855.591859307773562.4439424253360924
12174.889894548014634.6789029139280850.76485107229845555.876472054619563.085132426791521
2626.102368089116735.7535299291295150.0302020998913654.6618790048825641.478061905093236
3325.812364155135936.66598774032418449.02633829113078455.3667527169187250.8953760185932061
4981.399432629022641.1022941003192449.3977632036517654.847706608380471.6954636957219766

Extract the first column of inputSample into the sample of the flowrates Q.

[7]:
sampleQ = inputSample[:,0]
[8]:
import numpy as np

def computeConditionnedSample(sample, alpha = 0.9, criteriaComponent = None, selectedComponent = 0):
    '''
    Return values from the selectedComponent-th component of the sample.
    Selects the values according to the alpha-level quantile of
    the criteriaComponent-th component of the sample.
    '''
    dim = sample.getDimension()
    if criteriaComponent is None:
        criteriaComponent = dim - 1
    sortedSample = sample.sortAccordingToAComponent(criteriaComponent)
    quantiles = sortedSample.computeQuantilePerComponent(alpha)
    quantileValue = quantiles[criteriaComponent]
    sortedSampleCriteria = sortedSample[:,criteriaComponent]
    indices = np.where(np.array(sortedSampleCriteria.asPoint())>quantileValue)[0]
    conditionnedSortedSample = sortedSample[int(indices[0]):,selectedComponent]
    return conditionnedSortedSample

Create an histogram for the unconditional flowrates.

[9]:
numberOfBins = 10
histogram = ot.HistogramFactory().buildAsHistogram(sampleQ,numberOfBins)

Extract the sub-sample of the input flowrates Q which leads to large values of the output H.

[10]:
alpha = 0.9
criteriaComponent = 4
selectedComponent = 0
conditionnedSampleQ = computeConditionnedSample(sample,alpha,criteriaComponent,selectedComponent)

We could as well use:

conditionnedHistogram = ot.HistogramFactory().buildAsHistogram(conditionnedSampleQ)

but this creates an histogram with new classes, corresponding to conditionnedSampleQ. We want to use exactly the same classes as the full sample, so that the two histograms match.

[11]:
first = histogram.getFirst()
width = histogram.getWidth()
conditionnedHistogram = ot.HistogramFactory().buildAsHistogram(conditionnedSampleQ,first,width)

Then creates a graphics with the unconditional and the conditional histograms.

[12]:
graph = histogram.drawPDF()
graph.setLegends(["Q"])
#
graphConditionnalQ = conditionnedHistogram.drawPDF()
graphConditionnalQ.setColors(["blue"])
graphConditionnalQ.setLegends(["Q|H>H_%s" % (alpha)])
graph.add(graphConditionnalQ)
graph
[12]:
../../_images/examples_data_analysis_compare_unconditional_conditional_histograms_23_0.png

We see that the two histograms are very different. The high values of the input Q seem to often lead to a high value of the output H.

We could explore this situation further by comparing the unconditional distribution of Q (which is known in this case) with the conditonal distribution of Q|H>t, estimated by kernel smoothing. This would have the advantage of accuracy, since the kernel smoothing is a more accurate approximation of a distribution than the histogram.