.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_data_analysis_sample_analysis_plot_compare_unconditional_conditional_histograms.py>`     to download the full example code
    .. rst-class:: sphx-glr-example-title

    .. _sphx_glr_auto_data_analysis_sample_analysis_plot_compare_unconditional_conditional_histograms.py:


Compare unconditional and conditional histograms
================================================

In this example, we compare unconditional and conditional histograms for a simulation. We consider the :ref:`flooding model<use-case-flood-model>`.
Let :math:`g` be a function which takes four inputs :math:`Q`, :math:`K_s`, :math:`Z_v` and :math:`Z_m` and returns one output :math:`H`. 

We first consider the (unconditional) distribution of the input :math:`Q`. 

Let :math:`t` be a given threshold on the output :math:`H`: we consider the event :math:`H>t`. Then we consider the conditional distribution of the input :math:`Q` given that :math:`H>t` : :math:`Q|H>t`. 

If these two distributions are significantly different, we conclude that the input :math:`Q` has an impact on the event :math:`H>t`. 

In order to approximate the distribution of the output :math:`H`, we perform a Monte-Carlo simulation with size 500. The threshold :math:`t` is chosen as the 90% quantile of the empirical distribution of :math:`H`. In this example, the distribution is aproximated by its empirical histogram (but this could be done with another distribution approximation as well, such as kernel smoothing for example).


.. code-block:: default

    import openturns as ot
    import openturns.viewer as viewer
    from matplotlib import pylab as plt
    ot.Log.Show(ot.Log.NONE)


We use the `FloodModel` data class that contains all the case parameters.


.. code-block:: default

    from openturns.usecases import flood_model as flood_model
    fm = flood_model.FloodModel()


Create an input sample from the joint `distribution` defined in the data class. 
We build an output sample by taking the image by the `model`.


.. code-block:: default

    size = 500
    inputSample = fm.distribution.getSample(size)
    outputSample = fm.model(inputSample)


Merge the input and output samples into a single sample.


.. code-block:: default

    sample = ot.Sample(size,5)
    sample[:,0:4] = inputSample
    sample[:,4] = outputSample
    sample[0:5,:]


.. raw:: html

    <TABLE><TR><TD></TD><TH>v0</TH><TH>v1</TH><TH>v2</TH><TH>v3</TH><TH>v4</TH></TR>
    <TR><TH>0</TH><TD>2032.978</TD><TD>28.16431</TD><TD>49.81823</TD><TD>54.44882</TD><TD>-5.224069</TD></TR>
    <TR><TH>1</TH><TD>831.1784</TD><TD>32.06598</TD><TD>49.8578</TD><TD>54.29531</TD><TD>-6.747824</TD></TR>
    <TR><TH>2</TH><TD>1741.776</TD><TD>19.36681</TD><TD>49.08975</TD><TD>55.0745</TD><TD>-5.757122</TD></TR>
    <TR><TH>3</TH><TD>800.476</TD><TD>40.00743</TD><TD>49.16216</TD><TD>55.03673</TD><TD>-7.846938</TD></TR>
    <TR><TH>4</TH><TD>917.9835</TD><TD>38.23018</TD><TD>49.19878</TD><TD>54.97124</TD><TD>-7.629101</TD></TR>
    </TABLE>
    <br />
    <br />

Extract the first column of `inputSample` into the sample of the flowrates :math:`Q`.


.. code-block:: default

    sampleQ = inputSample[:,0]


.. code-block:: default

    import numpy as np

    def computeConditionnedSample(sample, alpha = 0.9, criteriaComponent = None, selectedComponent = 0):
        '''
        Return values from the selectedComponent-th component of the sample. 
        Selects the values according to the alpha-level quantile of 
        the criteriaComponent-th component of the sample.
        '''
        dim = sample.getDimension()
        if criteriaComponent is None:
            criteriaComponent = dim - 1
        sortedSample = sample.sortAccordingToAComponent(criteriaComponent)   
        quantiles = sortedSample.computeQuantilePerComponent(alpha)
        quantileValue = quantiles[criteriaComponent]
        sortedSampleCriteria = sortedSample[:,criteriaComponent]
        indices = np.where(np.array(sortedSampleCriteria.asPoint())>quantileValue)[0]
        conditionnedSortedSample = sortedSample[int(indices[0]):,selectedComponent]
        return conditionnedSortedSample


Create an histogram for the unconditional flowrates.


.. code-block:: default

    numberOfBins = 10
    histogram = ot.HistogramFactory().buildAsHistogram(sampleQ,numberOfBins)


Extract the sub-sample of the input flowrates Q which leads to large values of the output H.


.. code-block:: default

    alpha = 0.9
    criteriaComponent = 4
    selectedComponent = 0
    conditionnedSampleQ = computeConditionnedSample(sample,alpha,criteriaComponent,selectedComponent)


We could as well use:
```
conditionnedHistogram = ot.HistogramFactory().buildAsHistogram(conditionnedSampleQ)
```
but this creates an histogram with new classes, corresponding 
to `conditionnedSampleQ`.
We want to use exactly the same classes as the full sample, 
so that the two histograms match.


.. code-block:: default

    first = histogram.getFirst()
    width = histogram.getWidth()
    conditionnedHistogram = ot.HistogramFactory().buildAsHistogram(conditionnedSampleQ,first,width)


Then creates a graphics with the unconditional and the conditional histograms.


.. code-block:: default

    graph = histogram.drawPDF()
    graph.setLegends(["Q"])
    #
    graphConditionnalQ = conditionnedHistogram.drawPDF()
    graphConditionnalQ.setColors(["blue"])
    graphConditionnalQ.setLegends([r"$Q|H>H_{%s}$" % (alpha)])
    graph.add(graphConditionnalQ)
    view = viewer.View(graph)

    plt.show()


.. image:: /auto_data_analysis/sample_analysis/images/sphx_glr_plot_compare_unconditional_conditional_histograms_001.png
    :alt: Q PDF
    :class: sphx-glr-single-img


We see that the two histograms are very different. The high values of the input :math:`Q` seem to often lead to a high value of the output :math:`H`. 

We could explore this situation further by comparing the unconditional distribution of :math:`Q` (which is known in this case) with the conditonal distribution of :math:`Q|H>t`, estimated by kernel smoothing. This would have the advantage of accuracy, since the kernel smoothing is a more accurate approximation of a distribution than the histogram. 


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.161 seconds)


.. _sphx_glr_download_auto_data_analysis_sample_analysis_plot_compare_unconditional_conditional_histograms.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_compare_unconditional_conditional_histograms.py <plot_compare_unconditional_conditional_histograms.py>`


  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_compare_unconditional_conditional_histograms.ipynb <plot_compare_unconditional_conditional_histograms.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
	v0	v1	v2	v3	v4
0	2032.978	28.16431	49.81823	54.44882	-5.224069
1	831.1784	32.06598	49.8578	54.29531	-6.747824
2	1741.776	19.36681	49.08975	55.0745	-5.757122
3	800.476	40.00743	49.16216	55.03673	-7.846938
4	917.9835	38.23018	49.19878	54.97124	-7.629101