.. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_data_analysis_sample_analysis_plot_compare_unconditional_conditional_histograms.py: Compare unconditional and conditional histograms ================================================ In this example, we compare unconditional and conditional histograms for a simulation. We consider the :ref:`flooding model`. Let :math:`g` be a function which takes four inputs :math:`Q`, :math:`K_s`, :math:`Z_v` and :math:`Z_m` and returns one output :math:`H`. We first consider the (unconditional) distribution of the input :math:`Q`. Let :math:`t` be a given threshold on the output :math:`H`: we consider the event :math:`H>t`. Then we consider the conditional distribution of the input :math:`Q` given that :math:`H>t` : :math:`Q|H>t`. If these two distributions are significantly different, we conclude that the input :math:`Q` has an impact on the event :math:`H>t`. In order to approximate the distribution of the output :math:`H`, we perform a Monte-Carlo simulation with size 500. The threshold :math:`t` is chosen as the 90% quantile of the empirical distribution of :math:`H`. In this example, the distribution is aproximated by its empirical histogram (but this could be done with another distribution approximation as well, such as kernel smoothing for example). .. code-block:: default import openturns as ot import openturns.viewer as viewer from matplotlib import pylab as plt ot.Log.Show(ot.Log.NONE) We use the `FloodModel` data class that contains all the case parameters. .. code-block:: default from openturns.usecases import flood_model as flood_model fm = flood_model.FloodModel() Create an input sample from the joint `distribution` defined in the data class. We build an output sample by taking the image by the `model`. .. code-block:: default size = 500 inputSample = fm.distribution.getSample(size) outputSample = fm.model(inputSample) Merge the input and output samples into a single sample. .. code-block:: default sample = ot.Sample(size,5) sample[:,0:4] = inputSample sample[:,4] = outputSample sample[0:5,:] .. raw:: html
v0v1v2v3v4
02032.97828.1643149.8182354.44882-5.224069
1831.178432.0659849.857854.29531-6.747824
21741.77619.3668149.0897555.0745-5.757122
3800.47640.0074349.1621655.03673-7.846938
4917.983538.2301849.1987854.97124-7.629101


Extract the first column of `inputSample` into the sample of the flowrates :math:`Q`. .. code-block:: default sampleQ = inputSample[:,0] .. code-block:: default import numpy as np def computeConditionnedSample(sample, alpha = 0.9, criteriaComponent = None, selectedComponent = 0): ''' Return values from the selectedComponent-th component of the sample. Selects the values according to the alpha-level quantile of the criteriaComponent-th component of the sample. ''' dim = sample.getDimension() if criteriaComponent is None: criteriaComponent = dim - 1 sortedSample = sample.sortAccordingToAComponent(criteriaComponent) quantiles = sortedSample.computeQuantilePerComponent(alpha) quantileValue = quantiles[criteriaComponent] sortedSampleCriteria = sortedSample[:,criteriaComponent] indices = np.where(np.array(sortedSampleCriteria.asPoint())>quantileValue)[0] conditionnedSortedSample = sortedSample[int(indices[0]):,selectedComponent] return conditionnedSortedSample Create an histogram for the unconditional flowrates. .. code-block:: default numberOfBins = 10 histogram = ot.HistogramFactory().buildAsHistogram(sampleQ,numberOfBins) Extract the sub-sample of the input flowrates Q which leads to large values of the output H. .. code-block:: default alpha = 0.9 criteriaComponent = 4 selectedComponent = 0 conditionnedSampleQ = computeConditionnedSample(sample,alpha,criteriaComponent,selectedComponent) We could as well use: ``` conditionnedHistogram = ot.HistogramFactory().buildAsHistogram(conditionnedSampleQ) ``` but this creates an histogram with new classes, corresponding to `conditionnedSampleQ`. We want to use exactly the same classes as the full sample, so that the two histograms match. .. code-block:: default first = histogram.getFirst() width = histogram.getWidth() conditionnedHistogram = ot.HistogramFactory().buildAsHistogram(conditionnedSampleQ,first,width) Then creates a graphics with the unconditional and the conditional histograms. .. code-block:: default graph = histogram.drawPDF() graph.setLegends(["Q"]) # graphConditionnalQ = conditionnedHistogram.drawPDF() graphConditionnalQ.setColors(["blue"]) graphConditionnalQ.setLegends([r"$Q|H>H_{%s}$" % (alpha)]) graph.add(graphConditionnalQ) view = viewer.View(graph) plt.show() .. image:: /auto_data_analysis/sample_analysis/images/sphx_glr_plot_compare_unconditional_conditional_histograms_001.png :alt: Q PDF :class: sphx-glr-single-img We see that the two histograms are very different. The high values of the input :math:`Q` seem to often lead to a high value of the output :math:`H`. We could explore this situation further by comparing the unconditional distribution of :math:`Q` (which is known in this case) with the conditonal distribution of :math:`Q|H>t`, estimated by kernel smoothing. This would have the advantage of accuracy, since the kernel smoothing is a more accurate approximation of a distribution than the histogram. .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.161 seconds) .. _sphx_glr_download_auto_data_analysis_sample_analysis_plot_compare_unconditional_conditional_histograms.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_compare_unconditional_conditional_histograms.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_compare_unconditional_conditional_histograms.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_