.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_calibration_least_squares_and_gaussian_calibration_plot_calibration_flooding.py>`     to download the full example code
    .. rst-class:: sphx-glr-example-title

    .. _sphx_glr_auto_calibration_least_squares_and_gaussian_calibration_plot_calibration_flooding.py:


Calibration of the flooding model
=================================

In this example we are interested in the calibration of the :ref:`flooding model <use-case-flood-model>`.

Parameters to calibrate
-----------------------

The vector of parameters to calibrate is: 

.. math::
   \theta = (K_s,Z_v,Z_m).


The variables to calibrate are :math:`(K_s,Z_v,Z_m)` and are set to the following values:

.. math::
   K_s = 30, \qquad Z_v = 50, \qquad Z_m = 55.


Observations
------------

In this section, we describe the statistical model associated with the :math:`n` observations.
The errors of the water heights are associated with a gaussian distribution with a zero mean and a standard variation equal to:

.. math::
   \sigma=0.1.


Therefore, the observed water heights are:

.. math::
   H_i = G(Q_i,K_s,Z_v,Z_m) + \epsilon_i


for :math:`i=1,...,n` where

.. math::
   \epsilon \sim \mathcal{N}(0,\sigma^2)


and we make the hypothesis that the observation errors are independent.
We consider a sample size equal to:

.. math::
   n=100.


The observations are the couples :math:`\{(Q_i,H_i)\}_{i=1,...,n}`, i.e. each observation is a couple made of the flowrate and the corresponding river height.

Analysis
--------

In this model, the variables :math:`Z_m` and :math:`Z_v` are not identifiables, since only the difference :math:`Z_m-Z_v` matters. Hence, calibrating this model requires some regularization.

Generate the observations
-------------------------


.. code-block:: default

    import numpy as np
    import openturns as ot
    ot.ResourceMap.SetAsUnsignedInteger('Normal-SmallDimension', 1)
    import openturns.viewer as viewer
    from matplotlib import pylab as plt
    ot.Log.Show(ot.Log.NONE)


We load the flooding use case :


.. code-block:: default

    from openturns.usecases import flood_model as flood_model
    fm = flood_model.FloodModel()


We define the model :math:`g` which has 4 inputs and one output H.

The nonlinear least squares does not take into account for bounds in the parameters. Therefore, we ensure that the output is computed whatever the inputs. The model fails into two situations:

* if :math:`K_s<0`,
* if :math:`Z_v-Z_m<0`.

In these cases, we return an infinite number, so that the optimization algorithm does not get trapped.


.. code-block:: default

    def functionFlooding(X) :
        L = 5.0e3
        B = 300.0
        Q, K_s, Z_v, Z_m = X
        alpha = (Z_m - Z_v)/L
        if alpha < 0.0 or K_s <= 0.0:
            H = np.inf
        else:
            H = (Q/(K_s*B*np.sqrt(alpha)))**(3.0/5.0)
        return [H]


.. code-block:: default

    g = ot.PythonFunction(4, 1, functionFlooding) 
    g = ot.MemoizeFunction(g)
    g.setOutputDescription(["H (m)"])


We load the input distribution for :math:`Q` :


.. code-block:: default

    Q = fm.Q
    Q


.. raw:: html

    <p>TruncatedDistribution(Gumbel(beta = 558, gamma = 1013), bounds = [0, (19000.8) +inf[)</p>
    <br />
    <br />

Set the parameters to be calibrated.


.. code-block:: default

    K_s = ot.Dirac(30.0)
    Z_v = ot.Dirac(50.0)
    Z_m = ot.Dirac(55.0)
    K_s.setDescription(["Ks (m^(1/3)/s)"])
    Z_v.setDescription(["Zv (m)"])
    Z_m.setDescription(["Zm (m)"])


Create the joint input distribution.


.. code-block:: default

    inputRandomVector = ot.ComposedDistribution([Q, K_s, Z_v, Z_m])


Create a Monte-Carlo sample of the output H.


.. code-block:: default

    nbobs = 100
    inputSample = inputRandomVector.getSample(nbobs)
    outputH = g(inputSample)


Observe the distribution of the output H.


.. code-block:: default

    graph = ot.HistogramFactory().build(outputH).drawPDF()
    view = viewer.View(graph)


.. image:: /auto_calibration/least_squares_and_gaussian_calibration/images/sphx_glr_plot_calibration_flooding_001.png
    :alt: H (m) PDF
    :class: sphx-glr-single-img


Generate the observation noise and add it to the output of the model.


.. code-block:: default

    sigmaObservationNoiseH = 0.1 # (m)
    noiseH = ot.Normal(0.,sigmaObservationNoiseH)
    sampleNoiseH = noiseH.getSample(nbobs)
    Hobs = outputH + sampleNoiseH


Plot the Y observations versus the X observations.


.. code-block:: default

    Qobs = inputSample[:,0]


.. code-block:: default

    graph = ot.Graph("Observations","Q (m3/s)","H (m)",True)
    cloud = ot.Cloud(Qobs,Hobs)
    graph.add(cloud)
    view = viewer.View(graph)


.. image:: /auto_calibration/least_squares_and_gaussian_calibration/images/sphx_glr_plot_calibration_flooding_002.png
    :alt: Observations
    :class: sphx-glr-single-img


Setting the calibration parameters
----------------------------------

Define the value of the reference values of the :math:`\theta` parameter. In the bayesian framework, this is called the mean of the *prior* gaussian distribution. In the data assimilation framework, this is called the *background*.


.. code-block:: default

    KsInitial = 20.
    ZvInitial = 49.
    ZmInitial = 51.
    thetaPrior = ot.Point([KsInitial,ZvInitial,ZmInitial])


The following statement create the calibrated function from the model. The calibrated parameters Ks, Zv, Zm are at indices 1, 2, 3 in the inputs arguments of the model.


.. code-block:: default

    calibratedIndices = [1,2,3]
    mycf = ot.ParametricFunction(g, calibratedIndices, thetaPrior)


Calibration with linear least squares
-------------------------------------

The `LinearLeastSquaresCalibration` class performs the linear least squares calibration by linearizing the model in the neighbourhood of the reference point.


.. code-block:: default

    algo = ot.LinearLeastSquaresCalibration(mycf, Qobs, Hobs, thetaPrior, "SVD")


The `run` method computes the solution of the problem.


.. code-block:: default

    algo.run()


.. code-block:: default

    calibrationResult = algo.getResult()


The `getParameterMAP` method returns the maximum of the posterior distribution of :math:`\theta`.


.. code-block:: default

    thetaStar = calibrationResult.getParameterMAP()
    thetaStar


.. raw:: html

    <p>[4.88644e+08,2.08618e+23,2.08618e+23]</p>
    <br />
    <br />

In this case, we see that there seems to be a great distance from the reference value of :math:`\theta` to the optimum: the values seem too large in magnitude. The value of the optimum :math:`K_s` is nonpositive. In fact, there is an identification problem because the Jacobian matrix is rank-degenerate.

Diagnostic of the identification issue
--------------------------------------

In this section, we show how to diagnose the identification problem.

The `getParameterPosterior` method returns the posterior gaussian distribution of :math:`\theta`.


.. code-block:: default

    distributionPosterior = calibrationResult.getParameterPosterior()
    distributionPosterior


.. raw:: html

    <p>Normal(mu = [4.88644e+08,2.08618e+23,2.08618e+23], sigma = [2.54995e+26,2.00465e+32,2.00465e+32], R = [[  1            9.40306e-26 -9.40306e-26 ]<br>
     [  9.40306e-26  1            1           ]<br>
     [ -9.40306e-26  1            1           ]])</p>
    <br />
    <br />

We see that there is a large covariance matrix diagonal. 

Let us compute a 95% confidence interval for the solution :math:`\theta^\star`.


.. code-block:: default

    distributionPosterior.computeBilateralConfidenceIntervalWithMarginalProbability(0.95)[0]


.. raw:: html

    <p>[-5.70302e+26, 5.70302e+26]<br>
    [-4.48343e+32, 4.48343e+32]<br>
    [-4.48343e+32, 4.48343e+32]</p>
    <br />
    <br />

The confidence interval is *very* large.


.. code-block:: default

    mycf.setParameter(thetaPrior)
    thetaDim = thetaPrior.getDimension()
    jacobianMatrix = ot.Matrix(nbobs,thetaDim)
    for i in range(nbobs):
        jacobianMatrix[i,:] = mycf.parameterGradient(Qobs[i]).transpose()
    jacobianMatrix[0:5,:]


.. raw:: html

    <p>5x3<br>
    [[ -0.185062  0.92531  -0.92531  ]<br>
     [ -0.135774  0.678871 -0.678871 ]<br>
     [ -0.102219  0.511093 -0.511093 ]<br>
     [ -0.15354   0.767699 -0.767699 ]<br>
     [ -0.149497  0.747487 -0.747487 ]]</p>
    <br />
    <br />


.. code-block:: default

    jacobianMatrix.computeSingularValues()


.. raw:: html

    <p>[8.99149,1.82595e-10,1.42612e-25]</p>
    <br />
    <br />

We can see that there are two singular values which are relatively close to zero. 

This explains why the Jacobian matrix is close to being rank-degenerate.

Conclusion of the linear least squares calibration
--------------------------------------------------

There are several methods to solve the problem.

* Given that the problem is not identifiable, we can use some regularization method. Two methods are provided in the library: the gaussian linear least squares `GaussianLinearCalibration` and the gaussian non linear least squares `GaussianNonlinearCalibration`.
* We can change the problem, replacing it with a problem which is identifiable. In the flooding model, replacing :math:`Z_v-Z_m` with :math:`\Delta Z` allows to solve the issue.

Calibration with non linear least squares
-----------------------------------------

The `NonLinearLeastSquaresCalibration` class performs the non linear least squares calibration by minimizing the squared euclidian norm between the predictions and the observations.


.. code-block:: default

    algo = ot.NonLinearLeastSquaresCalibration(mycf, Qobs, Hobs, thetaPrior)


The `run` method computes the solution of the problem.


.. code-block:: default

    algo.run()


.. code-block:: default

    calibrationResult = algo.getResult()


Analysis of the results
-----------------------

The `getParameterMAP` method returns the maximum of the posterior distribution of :math:`\theta`.


.. code-block:: default

    thetaMAP = calibrationResult.getParameterMAP()
    thetaMAP


.. raw:: html

    <p>[27.5672,47.0914,52.9086]</p>
    <br />
    <br />

We can compute a 95% confidence interval of the parameter :math:`\theta^\star`. 

This confidence interval is based on bootstrap, based on a sample size equal to 100 (as long as the value of the `ResourceMap` key "NonLinearLeastSquaresCalibration-BootstrapSize" is unchanged). This confidence interval reflects the sensitivity of the optimum to the variability in the observations.


.. code-block:: default

    thetaPosterior = calibrationResult.getParameterPosterior()
    thetaPosterior.computeBilateralConfidenceIntervalWithMarginalProbability(0.95)[0]


.. raw:: html

    <p>[27.3828, 27.7132]<br>
    [47.0298, 47.1693]<br>
    [52.8307, 52.9702]</p>
    <br />
    <br />

In this case, the value of the parameter :math:`K_s` is quite accurately computed, but there is a relatively large uncertainty on the values of :math:`Z_v` and :math:`Z_m`.


.. code-block:: default

    graph = calibrationResult.drawObservationsVsInputs()
    graph.setLegendPosition("topleft")
    view = viewer.View(graph)


.. image:: /auto_calibration/least_squares_and_gaussian_calibration/images/sphx_glr_plot_calibration_flooding_003.png
    :alt: plot calibration flooding
    :class: sphx-glr-single-img


We see that there is a good fit after calibration, since the predictions after calibration (i.e. the green crosses) are close to the observations (i.e. the blue crosses).


.. code-block:: default

    graph = calibrationResult.drawObservationsVsPredictions()
    view = viewer.View(graph)


.. image:: /auto_calibration/least_squares_and_gaussian_calibration/images/sphx_glr_plot_calibration_flooding_004.png
    :alt: plot calibration flooding
    :class: sphx-glr-single-img


We see that there is a much better fit after calibration, since the predictions are close to the diagonal of the graphics.


.. code-block:: default

    observationError = calibrationResult.getObservationsError()
    observationError


.. raw:: html

    <p>Normal(mu = -0.00344428, sigma = 0.111195)</p>
    <br />
    <br />

We can see that the observation error is close to have a zero mean and a standard deviation equal to 0.1.


.. code-block:: default

    graph = calibrationResult.drawResiduals()
    graph.setLegendPosition("topleft")
    view = viewer.View(graph)


.. image:: /auto_calibration/least_squares_and_gaussian_calibration/images/sphx_glr_plot_calibration_flooding_005.png
    :alt: plot calibration flooding
    :class: sphx-glr-single-img


The analysis of the residuals shows that the distribution is centered on zero and symmetric. This indicates that the calibration performed well. 

Moreover, the distribution of the residuals is close to being gaussian.


.. code-block:: default

    graph = calibrationResult.drawParameterDistributions()
    view = viewer.View(graph)


.. image:: /auto_calibration/least_squares_and_gaussian_calibration/images/sphx_glr_plot_calibration_flooding_006.png
    :alt: plot calibration flooding
    :class: sphx-glr-single-img


Gaussian linear calibration
---------------------------

The standard deviation of the observations.


.. code-block:: default

    sigmaH = 0.5 # (m^2)


Define the covariance matrix of the output Y of the model.


.. code-block:: default

    errorCovariance = ot.CovarianceMatrix(1)
    errorCovariance[0,0] = sigmaH**2


Define the covariance matrix of the parameters :math:`\theta` to calibrate.


.. code-block:: default

    sigmaKs = 5.
    sigmaZv = 1.
    sigmaZm = 1.


.. code-block:: default

    sigma = ot.CovarianceMatrix(3)
    sigma[0,0] = sigmaKs**2
    sigma[1,1] = sigmaZv**2
    sigma[2,2] = sigmaZm**2
    sigma


.. raw:: html

    <p>[[ 25  0  0 ]<br>
     [  0  1  0 ]<br>
     [  0  0  1 ]]</p>
    <br />
    <br />

The `GaussianLinearCalibration` class performs the gaussian linear calibration by linearizing the model in the neighbourhood of the prior.


.. code-block:: default

    algo = ot.GaussianLinearCalibration(mycf, Qobs, Hobs, thetaPrior, sigma, errorCovariance,"SVD")


The `run` method computes the solution of the problem.


.. code-block:: default

    algo.run()


.. code-block:: default

    calibrationResult = algo.getResult()


Analysis of the results
-----------------------

The `getParameterMAP` method returns the maximum of the posterior distribution of :math:`\theta`.


.. code-block:: default

    thetaStar = calibrationResult.getParameterMAP()
    thetaStar


.. raw:: html

    <p>[24.4485,48.1103,51.8897]</p>
    <br />
    <br />


.. code-block:: default

    graph = calibrationResult.drawObservationsVsInputs()
    graph.setLegendPosition("topleft")
    view = viewer.View(graph)


.. image:: /auto_calibration/least_squares_and_gaussian_calibration/images/sphx_glr_plot_calibration_flooding_007.png
    :alt: plot calibration flooding
    :class: sphx-glr-single-img


We see that the output of the model after calibration is closer to the observations. However, there is still a distance from the outputs of the model to the observations. This indicates that the calibration cannot be performed with this method.


.. code-block:: default

    graph = calibrationResult.drawObservationsVsPredictions()
    view = viewer.View(graph)


.. image:: /auto_calibration/least_squares_and_gaussian_calibration/images/sphx_glr_plot_calibration_flooding_008.png
    :alt: plot calibration flooding
    :class: sphx-glr-single-img


In this case, the fit is better after calibration, but not perfect. Indeed, the cloud of points after calibration is not centered on the diagonal. 


.. code-block:: default

    graph = calibrationResult.drawResiduals()
    graph.setLegendPosition("topleft")
    view = viewer.View(graph)


.. image:: /auto_calibration/least_squares_and_gaussian_calibration/images/sphx_glr_plot_calibration_flooding_009.png
    :alt: plot calibration flooding
    :class: sphx-glr-single-img


We see that the distribution of the residual is not centered on zero: the mean residual is approximately -0.5, which implies that the predictions are, on average, smaller than the observations. This is a proof that the calibration cannot be performed with this method in this particular case.

The `getParameterPosterior` method returns the posterior gaussian distribution of :math:`\theta`.


.. code-block:: default

    distributionPosterior = calibrationResult.getParameterPosterior()
    distributionPosterior


.. raw:: html

    <p>Normal(mu = [24.4485,48.1103,51.8897], sigma = [4.08462,0.816925,0.816925], R = [[  1         0.498428 -0.498428 ]<br>
     [  0.498428  1         0.498428 ]<br>
     [ -0.498428  0.498428  1        ]])</p>
    <br />
    <br />

We can compute a 95% confidence interval of the parameter :math:`\theta^\star`.


.. code-block:: default

    distributionPosterior.computeBilateralConfidenceIntervalWithMarginalProbability(0.95)[0]


.. raw:: html

    <p>[14.8749, 34.0221]<br>
    [46.1956, 50.025]<br>
    [49.975, 53.8044]</p>
    <br />
    <br />

We see that there is a large uncertainty on the value of the parameter :math:`K_s` which can be as small as 14 and as large as 34. 

We can compare the prior and posterior distributions of the marginals of :math:`\theta`. 


.. code-block:: default

    graph = calibrationResult.drawParameterDistributions()
    view = viewer.View(graph)


.. image:: /auto_calibration/least_squares_and_gaussian_calibration/images/sphx_glr_plot_calibration_flooding_010.png
    :alt: plot calibration flooding
    :class: sphx-glr-single-img


The two distributions are different, which shows that the calibration is sensible to the observations (if the observations were not sensible, the two distributions were superimposed). Moreover, the two distributions are quite close, which implies that the prior distribution has played a roled in the calibration (otherwise the two distributions would be completely different, indicating that only the observations were taken into account). 

Gaussian nonlinear calibration
------------------------------

The `GaussianNonLinearCalibration` class performs the gaussian nonlinear calibration.


.. code-block:: default

    algo = ot.GaussianNonLinearCalibration(mycf, Qobs, Hobs, thetaPrior, sigma, errorCovariance)


The `run` method computes the solution of the problem.


.. code-block:: default

    algo.run()


.. code-block:: default

    calibrationResult = algo.getResult()


Analysis of the results
-----------------------

The `getParameterMAP` method returns the maximum of the posterior distribution of :math:`\theta`.


.. code-block:: default

    thetaStar = calibrationResult.getParameterMAP()
    thetaStar


.. raw:: html

    <p>[30.307,47.6581,52.3419]</p>
    <br />
    <br />


.. code-block:: default

    graph = calibrationResult.drawObservationsVsInputs()
    graph.setLegendPosition("topleft")
    view = viewer.View(graph)


.. image:: /auto_calibration/least_squares_and_gaussian_calibration/images/sphx_glr_plot_calibration_flooding_011.png
    :alt: plot calibration flooding
    :class: sphx-glr-single-img


We see that the output of the model after calibration is in the middle of the observations: the calibration seems correct.


.. code-block:: default

    graph = calibrationResult.drawObservationsVsPredictions()
    view = viewer.View(graph)


.. image:: /auto_calibration/least_squares_and_gaussian_calibration/images/sphx_glr_plot_calibration_flooding_012.png
    :alt: plot calibration flooding
    :class: sphx-glr-single-img


The fit is excellent after calibration. Indeed, the cloud of points after calibration is on the diagonal. 


.. code-block:: default

    graph = calibrationResult.drawResiduals()
    view = viewer.View(graph)


.. image:: /auto_calibration/least_squares_and_gaussian_calibration/images/sphx_glr_plot_calibration_flooding_013.png
    :alt: plot calibration flooding
    :class: sphx-glr-single-img


We see that the histogram of the residual is centered on zero. This is a proof that the calibration did perform correctly.

The `getParameterPosterior` method returns the posterior gaussian distribution of :math:`\theta`.


.. code-block:: default

    distributionPosterior = calibrationResult.getParameterPosterior()


We can compute a 95% confidence interval of the parameter :math:`\theta^\star`.


.. code-block:: default

    distributionPosterior.computeBilateralConfidenceIntervalWithMarginalProbability(0.95)[0]


.. raw:: html

    <p>[30.2569, 30.8322]<br>
    [47.5907, 47.6644]<br>
    [52.3356, 52.4093]</p>
    <br />
    <br />

We see that there is a small uncertainty on the value of all parameters.

We can compare the prior and posterior distributions of the marginals of :math:`\theta`. 


.. code-block:: default

    graph = calibrationResult.drawParameterDistributions()
    view = viewer.View(graph)


.. image:: /auto_calibration/least_squares_and_gaussian_calibration/images/sphx_glr_plot_calibration_flooding_014.png
    :alt: plot calibration flooding
    :class: sphx-glr-single-img


The two distributions are very different, with a spiky posterior distribution. This shows that the calibration is very sensible to the observations. 

Tuning the posterior distribution estimation
--------------------------------------------

The "GaussianNonLinearCalibration-BootstrapSize" key controls the posterior distribution estimation.

* If "GaussianNonLinearCalibration-BootstrapSize" > 0 (by default it is equal to 100), then a bootstrap resample algorithm is used to see the dispersion of the MAP estimator. This allows to see the variability of the estimator with respect to the finite observation sample.
* If "GaussianNonLinearCalibration-BootstrapSize" is zero, then the gaussian linear calibration estimator is used (i.e. the `GaussianLinearCalibration` class) at the optimum. This is called the Laplace approximation. 

We must configure the key before creating the object (otherwise changing the parameter does not change the result). 


.. code-block:: default

    ot.ResourceMap_SetAsUnsignedInteger("GaussianNonLinearCalibration-BootstrapSize",0) 


.. code-block:: default

    algo = ot.GaussianNonLinearCalibration(mycf, Qobs, Hobs, thetaPrior, sigma, errorCovariance)


.. code-block:: default

    algo.run()


.. code-block:: default

    calibrationResult = algo.getResult()


.. code-block:: default

    graph = calibrationResult.drawParameterDistributions()

    plt.show()


As we can see, this does not change much the posterior distribution, which remains spiky. 


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  5.580 seconds)


.. _sphx_glr_download_auto_calibration_least_squares_and_gaussian_calibration_plot_calibration_flooding.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_calibration_flooding.py <plot_calibration_flooding.py>`


  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_calibration_flooding.ipynb <plot_calibration_flooding.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_