.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here ` to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_auto_data_analysis_sample_analysis_plot_compare_unconditional_conditional_histograms.py:
Compare unconditional and conditional histograms
================================================
In this example, we compare unconditional and conditional histograms for a simulation. We consider the :ref:`flooding model`.
Let :math:`g` be a function which takes four inputs :math:`Q`, :math:`K_s`, :math:`Z_v` and :math:`Z_m` and returns one output :math:`H`.
We first consider the (unconditional) distribution of the input :math:`Q`.
Let :math:`t` be a given threshold on the output :math:`H`: we consider the event :math:`H>t`. Then we consider the conditional distribution of the input :math:`Q` given that :math:`H>t` : :math:`Q|H>t`.
If these two distributions are significantly different, we conclude that the input :math:`Q` has an impact on the event :math:`H>t`.
In order to approximate the distribution of the output :math:`H`, we perform a Monte-Carlo simulation with size 500. The threshold :math:`t` is chosen as the 90% quantile of the empirical distribution of :math:`H`. In this example, the distribution is aproximated by its empirical histogram (but this could be done with another distribution approximation as well, such as kernel smoothing for example).
.. code-block:: default
import openturns as ot
import openturns.viewer as viewer
from matplotlib import pylab as plt
ot.Log.Show(ot.Log.NONE)
We use the `FloodModel` data class that contains all the case parameters.
.. code-block:: default
from openturns.usecases import flood_model as flood_model
fm = flood_model.FloodModel()
Create an input sample from the joint `distribution` defined in the data class.
We build an output sample by taking the image by the `model`.
.. code-block:: default
size = 500
inputSample = fm.distribution.getSample(size)
outputSample = fm.model(inputSample)
Merge the input and output samples into a single sample.
.. code-block:: default
sample = ot.Sample(size,5)
sample[:,0:4] = inputSample
sample[:,4] = outputSample
sample[0:5,:]
.. raw:: html
| v0 | v1 | v2 | v3 | v4 |
0 | 2032.978 | 28.16431 | 49.81823 | 54.44882 | -5.224069 |
1 | 831.1784 | 32.06598 | 49.8578 | 54.29531 | -6.747824 |
2 | 1741.776 | 19.36681 | 49.08975 | 55.0745 | -5.757122 |
3 | 800.476 | 40.00743 | 49.16216 | 55.03673 | -7.846938 |
4 | 917.9835 | 38.23018 | 49.19878 | 54.97124 | -7.629101 |
Extract the first column of `inputSample` into the sample of the flowrates :math:`Q`.
.. code-block:: default
sampleQ = inputSample[:,0]
.. code-block:: default
import numpy as np
def computeConditionnedSample(sample, alpha = 0.9, criteriaComponent = None, selectedComponent = 0):
'''
Return values from the selectedComponent-th component of the sample.
Selects the values according to the alpha-level quantile of
the criteriaComponent-th component of the sample.
'''
dim = sample.getDimension()
if criteriaComponent is None:
criteriaComponent = dim - 1
sortedSample = sample.sortAccordingToAComponent(criteriaComponent)
quantiles = sortedSample.computeQuantilePerComponent(alpha)
quantileValue = quantiles[criteriaComponent]
sortedSampleCriteria = sortedSample[:,criteriaComponent]
indices = np.where(np.array(sortedSampleCriteria.asPoint())>quantileValue)[0]
conditionnedSortedSample = sortedSample[int(indices[0]):,selectedComponent]
return conditionnedSortedSample
Create an histogram for the unconditional flowrates.
.. code-block:: default
numberOfBins = 10
histogram = ot.HistogramFactory().buildAsHistogram(sampleQ,numberOfBins)
Extract the sub-sample of the input flowrates Q which leads to large values of the output H.
.. code-block:: default
alpha = 0.9
criteriaComponent = 4
selectedComponent = 0
conditionnedSampleQ = computeConditionnedSample(sample,alpha,criteriaComponent,selectedComponent)
We could as well use:
```
conditionnedHistogram = ot.HistogramFactory().buildAsHistogram(conditionnedSampleQ)
```
but this creates an histogram with new classes, corresponding
to `conditionnedSampleQ`.
We want to use exactly the same classes as the full sample,
so that the two histograms match.
.. code-block:: default
first = histogram.getFirst()
width = histogram.getWidth()
conditionnedHistogram = ot.HistogramFactory().buildAsHistogram(conditionnedSampleQ,first,width)
Then creates a graphics with the unconditional and the conditional histograms.
.. code-block:: default
graph = histogram.drawPDF()
graph.setLegends(["Q"])
#
graphConditionnalQ = conditionnedHistogram.drawPDF()
graphConditionnalQ.setColors(["blue"])
graphConditionnalQ.setLegends([r"$Q|H>H_{%s}$" % (alpha)])
graph.add(graphConditionnalQ)
view = viewer.View(graph)
plt.show()
.. image:: /auto_data_analysis/sample_analysis/images/sphx_glr_plot_compare_unconditional_conditional_histograms_001.png
:alt: Q PDF
:class: sphx-glr-single-img
We see that the two histograms are very different. The high values of the input :math:`Q` seem to often lead to a high value of the output :math:`H`.
We could explore this situation further by comparing the unconditional distribution of :math:`Q` (which is known in this case) with the conditonal distribution of :math:`Q|H>t`, estimated by kernel smoothing. This would have the advantage of accuracy, since the kernel smoothing is a more accurate approximation of a distribution than the histogram.
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 0.161 seconds)
.. _sphx_glr_download_auto_data_analysis_sample_analysis_plot_compare_unconditional_conditional_histograms.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: plot_compare_unconditional_conditional_histograms.py `
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: plot_compare_unconditional_conditional_histograms.ipynb `
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery `_