.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_data_analysis/sample_analysis/plot_compare_unconditional_conditional_histograms.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_data_analysis_sample_analysis_plot_compare_unconditional_conditional_histograms.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_data_analysis_sample_analysis_plot_compare_unconditional_conditional_histograms.py:


Compare unconditional and conditional histograms
================================================

.. GENERATED FROM PYTHON SOURCE LINES 6-16

In this example, we compare unconditional and conditional histograms for a simulation. We consider the :ref:`flooding model<use-case-flood-model>`.
Let :math:`g` be a function which takes four inputs :math:`Q`, :math:`K_s`, :math:`Z_v` and :math:`Z_m` and returns one output :math:`H`.

We first consider the (unconditional) distribution of the input :math:`Q`.

Let :math:`t` be a given threshold on the output :math:`H`: we consider the event :math:`H>t`. Then we consider the conditional distribution of the input :math:`Q` given that :math:`H>t` : :math:`Q|H>t`.

If these two distributions are significantly different, we conclude that the input :math:`Q` has an impact on the event :math:`H>t`.

In order to approximate the distribution of the output :math:`H`, we perform a Monte-Carlo simulation with size 500. The threshold :math:`t` is chosen as the 90% quantile of the empirical distribution of :math:`H`. In this example, the distribution is aproximated by its empirical histogram (but this could be done with another distribution approximation as well, such as kernel smoothing for example).

.. GENERATED FROM PYTHON SOURCE LINES 18-25

.. code-block:: default

    import numpy as np
    from openturns.usecases import flood_model as flood_model
    import openturns as ot
    import openturns.viewer as viewer
    from matplotlib import pylab as plt
    ot.Log.Show(ot.Log.NONE)


.. GENERATED FROM PYTHON SOURCE LINES 26-27

We use the `FloodModel` data class that contains all the case parameters.

.. GENERATED FROM PYTHON SOURCE LINES 27-30

.. code-block:: default

    fm = flood_model.FloodModel()


.. GENERATED FROM PYTHON SOURCE LINES 31-33

Create an input sample from the joint `distribution` defined in the data class.
We build an output sample by taking the image by the `model`.

.. GENERATED FROM PYTHON SOURCE LINES 35-39

.. code-block:: default

    size = 500
    inputSample = fm.distribution.getSample(size)
    outputSample = fm.model(inputSample)


.. GENERATED FROM PYTHON SOURCE LINES 40-41

Merge the input and output samples into a single sample.

.. GENERATED FROM PYTHON SOURCE LINES 43-48

.. code-block:: default

    sample = ot.Sample(size, 5)
    sample[:, 0:4] = inputSample
    sample[:, 4] = outputSample
    sample[0:5, :]


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <TABLE><TR><TD></TD><TH>v0</TH><TH>v1</TH><TH>v2</TH><TH>v3</TH><TH>v4</TH></TR>
    <TR><TH>0</TH><TD>2032.978</TD><TD>28.16431</TD><TD>49.81823</TD><TD>54.44882</TD><TD>-5.224069</TD></TR>
    <TR><TH>1</TH><TD>831.1784</TD><TD>32.06598</TD><TD>49.8578</TD><TD>54.29531</TD><TD>-6.747824</TD></TR>
    <TR><TH>2</TH><TD>1741.776</TD><TD>19.36681</TD><TD>49.08975</TD><TD>55.0745</TD><TD>-5.757122</TD></TR>
    <TR><TH>3</TH><TD>800.476</TD><TD>40.00743</TD><TD>49.16216</TD><TD>55.03673</TD><TD>-7.846938</TD></TR>
    <TR><TH>4</TH><TD>917.9835</TD><TD>38.23018</TD><TD>49.19878</TD><TD>54.97124</TD><TD>-7.629101</TD></TR>
    </TABLE>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 49-50

Extract the first column of `inputSample` into the sample of the flowrates :math:`Q`.

.. GENERATED FROM PYTHON SOURCE LINES 52-54

.. code-block:: default

    sampleQ = inputSample[:, 0]


.. GENERATED FROM PYTHON SOURCE LINES 55-77

.. code-block:: default


    def computeConditionnedSample(sample, alpha=0.9, criteriaComponent=None, selectedComponent=0):
        '''
        Return values from the selectedComponent-th component of the sample.
        Selects the values according to the alpha-level quantile of
        the criteriaComponent-th component of the sample.
        '''
        dim = sample.getDimension()
        if criteriaComponent is None:
            criteriaComponent = dim - 1
        sortedSample = sample.sortAccordingToAComponent(criteriaComponent)
        quantiles = sortedSample.computeQuantilePerComponent(alpha)
        quantileValue = quantiles[criteriaComponent]
        sortedSampleCriteria = sortedSample[:, criteriaComponent]
        indices = np.where(
            np.array(sortedSampleCriteria.asPoint()) > quantileValue)[0]
        conditionnedSortedSample = sortedSample[int(
            indices[0]):, selectedComponent]
        return conditionnedSortedSample


.. GENERATED FROM PYTHON SOURCE LINES 78-79

Create an histogram for the unconditional flowrates.

.. GENERATED FROM PYTHON SOURCE LINES 81-84

.. code-block:: default

    numberOfBins = 10
    histogram = ot.HistogramFactory().buildAsHistogram(sampleQ, numberOfBins)


.. GENERATED FROM PYTHON SOURCE LINES 85-86

Extract the sub-sample of the input flowrates Q which leads to large values of the output H.

.. GENERATED FROM PYTHON SOURCE LINES 88-94

.. code-block:: default

    alpha = 0.9
    criteriaComponent = 4
    selectedComponent = 0
    conditionnedSampleQ = computeConditionnedSample(
        sample, alpha, criteriaComponent, selectedComponent)


.. GENERATED FROM PYTHON SOURCE LINES 95-103

We could as well use:
```
conditionnedHistogram = ot.HistogramFactory().buildAsHistogram(conditionnedSampleQ)
```
but this creates an histogram with new classes, corresponding
to `conditionnedSampleQ`.
We want to use exactly the same classes as the full sample,
so that the two histograms match.

.. GENERATED FROM PYTHON SOURCE LINES 105-110

.. code-block:: default

    first = histogram.getFirst()
    width = histogram.getWidth()
    conditionnedHistogram = ot.HistogramFactory().buildAsHistogram(
        conditionnedSampleQ, first, width)


.. GENERATED FROM PYTHON SOURCE LINES 111-112

Then creates a graphics with the unconditional and the conditional histograms.

.. GENERATED FROM PYTHON SOURCE LINES 114-124

.. code-block:: default

    graph = histogram.drawPDF()
    graph.setLegends(["Q"])
    #
    graphConditionnalQ = conditionnedHistogram.drawPDF()
    graphConditionnalQ.setColors(["blue"])
    graphConditionnalQ.setLegends([r"$Q|H>H_{%s}$" % (alpha)])
    graph.add(graphConditionnalQ)
    view = viewer.View(graph)

    plt.show()


.. image-sg:: /auto_data_analysis/sample_analysis/images/sphx_glr_plot_compare_unconditional_conditional_histograms_001.png
   :alt: Q PDF
   :srcset: /auto_data_analysis/sample_analysis/images/sphx_glr_plot_compare_unconditional_conditional_histograms_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 125-128

We see that the two histograms are very different. The high values of the input :math:`Q` seem to often lead to a high value of the output :math:`H`.

We could explore this situation further by comparing the unconditional distribution of :math:`Q` (which is known in this case) with the conditonal distribution of :math:`Q|H>t`, estimated by kernel smoothing. This would have the advantage of accuracy, since the kernel smoothing is a more accurate approximation of a distribution than the histogram.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.218 seconds)


.. _sphx_glr_download_auto_data_analysis_sample_analysis_plot_compare_unconditional_conditional_histograms.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_compare_unconditional_conditional_histograms.py <plot_compare_unconditional_conditional_histograms.py>`


  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_compare_unconditional_conditional_histograms.ipynb <plot_compare_unconditional_conditional_histograms.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
	v0	v1	v2	v3	v4
0	2032.978	28.16431	49.81823	54.44882	-5.224069
1	831.1784	32.06598	49.8578	54.29531	-6.747824
2	1741.776	19.36681	49.08975	55.0745	-5.757122
3	800.476	40.00743	49.16216	55.03673	-7.846938
4	917.9835	38.23018	49.19878	54.97124	-7.629101