.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_meta_modeling/polynomial_chaos_metamodel/plot_chaos_beam_sensitivity_degree.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_meta_modeling_polynomial_chaos_metamodel_plot_chaos_beam_sensitivity_degree.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_meta_modeling_polynomial_chaos_metamodel_plot_chaos_beam_sensitivity_degree.py:


Polynomial chaos is sensitive to the degree
===========================================

.. GENERATED FROM PYTHON SOURCE LINES 7-15

Introduction
------------

In this example, we observe the sensitivity of the polynomial chaos expansion to the total degree of the polynomial.
More precisely, we observe how this impacts the :math:`R^2` predictivity coefficient.

We consider the example of the cantilever beam. We create a sparse polynomial chaos with a linear enumeration rule
and the family of orthogonal polynomials corresponding to each input variable.

.. GENERATED FROM PYTHON SOURCE LINES 18-23

.. code-block:: Python

    import openturns as ot
    import numpy as np
    import openturns.viewer
    import matplotlib.pyplot as plt


.. GENERATED FROM PYTHON SOURCE LINES 24-25

The following parameter value leads to fast simulations.

.. GENERATED FROM PYTHON SOURCE LINES 27-29

.. code-block:: Python

    maxDegree = 4


.. GENERATED FROM PYTHON SOURCE LINES 30-31

For real tests, we suggest using the following parameter value:

.. GENERATED FROM PYTHON SOURCE LINES 33-34

`maxDegree` = 7

.. GENERATED FROM PYTHON SOURCE LINES 36-37

Let us define the parameters of the cantilever beam problem.

.. GENERATED FROM PYTHON SOURCE LINES 39-65

.. code-block:: Python

    dist_E = ot.Beta(0.9, 2.2, 2.8e7, 4.8e7)
    dist_E.setDescription(["E"])
    F_para = ot.LogNormalMuSigma(3.0e4, 9.0e3, 15.0e3)  # in N
    dist_F = ot.ParametrizedDistribution(F_para)
    dist_F.setDescription(["F"])
    dist_L = ot.Uniform(250.0, 260.0)  # in cm
    dist_L.setDescription(["L"])
    dist_I = ot.Beta(2.5, 1.5, 310.0, 450.0)  # in cm^4
    dist_I.setDescription(["I"])

    myDistribution = ot.JointDistribution([dist_E, dist_F, dist_L, dist_I])

    dim_input = 4  # dimension of the input
    dim_output = 1  # dimension of the output


    def function_beam(X):
        E, F, L, II = X
        Y = F * L**3 / (3 * E * II)
        return [Y]


    g = ot.PythonFunction(dim_input, dim_output, function_beam)
    g.setInputDescription(myDistribution.getDescription())


.. GENERATED FROM PYTHON SOURCE LINES 66-67

The following function creates a sparse polynomial chaos with a given total degree.

.. GENERATED FROM PYTHON SOURCE LINES 70-113

.. code-block:: Python

    def ComputeSparseLeastSquaresChaos(
        inputTrain, outputTrain, multivariateBasis, totalDegree, myDistribution
    ):
        """
        Create a sparse polynomial chaos based on least squares.

        * Uses the enumerate rule in multivariateBasis.
        * Uses the LeastSquaresStrategy to compute the coefficients based on
          least squares.
        * Uses LeastSquaresMetaModelSelectionFactory to use the LARS selection method.
        * Uses FixedStrategy in order to keep all the coefficients that the
          LARS method selected.

        Parameters
        ----------
        inputTrain : ot.Sample
            The input design of experiments.
        outputTrain : ot.Sample
            The output design of experiments.
        multivariateBasis : ot.Basis
            The multivariate chaos basis.
        totalDegree : int
            The total degree of the chaos polynomial.
        myDistribution : ot.Distribution.
            The distribution of the input variable.
        Returns
        -------
        result : ot.PolynomialChaosResult
            The estimated polynomial chaos.
        """
        selectionAlgorithm = ot.LeastSquaresMetaModelSelectionFactory()
        projectionStrategy = ot.LeastSquaresStrategy(selectionAlgorithm)
        enumfunc = multivariateBasis.getEnumerateFunction()
        basisSize = enumfunc.getBasisSizeFromTotalDegree(totalDegree)
        adaptiveStrategy = ot.FixedStrategy(multivariateBasis, basisSize)
        chaosalgo = ot.FunctionalChaosAlgorithm(
            inputTrain, outputTrain, myDistribution, adaptiveStrategy, projectionStrategy
        )
        chaosalgo.run()
        result = chaosalgo.getResult()
        return result


.. GENERATED FROM PYTHON SOURCE LINES 114-118

The following function computes the sparsity rate of the polynomial chaos.
To do this, we compute the number of coefficients in the decomposition assuming a linear enumeration rule and a fixed truncation.
The sparsity rate is the complement of the ratio between the number of coefficients
selected from LARS and the total number of coefficients in the full polynomial basis.

.. GENERATED FROM PYTHON SOURCE LINES 121-134

.. code-block:: Python

    def computeSparsityRate(multivariateBasis, totalDegree, chaosResult):
        """Compute the sparsity rate, assuming a FixedStrategy."""
        # Get basisSize, the number of candidate coefficients
        enumfunc = multivariateBasis.getEnumerateFunction()
        basisSize = enumfunc.getStrataCumulatedCardinal(totalDegree)
        # Get number of coefficients in the selection
        indices = chaosResult.getIndices()
        nbcoeffs = indices.getSize()
        # Compute rate
        sparsityRate = 1.0 - nbcoeffs / basisSize
        return sparsityRate


.. GENERATED FROM PYTHON SOURCE LINES 135-136

The following functions compute and plot the :math:`R^2` predictivity coefficients within the validation plot.

.. GENERATED FROM PYTHON SOURCE LINES 139-149

.. code-block:: Python

    def computeR2Chaos(chaosResult, inputTest, outputTest):
        """Compute the R2 of a chaos."""
        metamodel = chaosResult.getMetaModel()
        metamodelPredictions = metamodel(inputTest)
        val = ot.MetaModelValidation(outputTest, metamodelPredictions)
        r2Score = val.computeR2Score()[0]
        r2Score = max(r2Score, 0.0)  # We are not lucky every day.
        return r2Score


.. GENERATED FROM PYTHON SOURCE LINES 150-168

.. code-block:: Python

    def printChaosStats(multivariateBasis, chaosResult, inputTest, outputTest, totalDegree):
        """Print statistics of a chaos."""
        sparsityRate = computeSparsityRate(multivariateBasis, totalDegree, chaosResult)
        r2Score = computeR2Chaos(chaosResult, inputTest, outputTest)
        metamodel = chaosResult.getMetaModel()
        metamodelPredictions = metamodel(inputTest)
        val = ot.MetaModelValidation(outputTest, metamodelPredictions)
        graph = val.drawValidation().getGraph(0, 0)
        legend1 = "D=%d, R2=%.2f%%" % (totalDegree, 100 * r2Score)
        graph.setLegends(["", legend1])
        graph.setLegendPosition("upper left")
        print(
            "Degree=%d, R2=%.2f%%, Sparsity=%.2f%%"
            % (totalDegree, 100 * r2Score, 100 * sparsityRate)
        )
        return graph


.. GENERATED FROM PYTHON SOURCE LINES 169-173

.. code-block:: Python

    multivariateBasis = ot.OrthogonalProductPolynomialFactory(
        [dist_E, dist_F, dist_L, dist_I]
    )


.. GENERATED FROM PYTHON SOURCE LINES 174-176

.. code-block:: Python

    N = 20  # size of the train design


.. GENERATED FROM PYTHON SOURCE LINES 177-179

.. code-block:: Python

    n_valid = 1000  # size of the test design


.. GENERATED FROM PYTHON SOURCE LINES 180-181

The seed is selected to get *interesting* results.

.. GENERATED FROM PYTHON SOURCE LINES 183-186

.. code-block:: Python

    magicSeed = 43  # 127 is funny too
    ot.RandomGenerator.SetSeed(magicSeed)


.. GENERATED FROM PYTHON SOURCE LINES 187-203

.. code-block:: Python

    inputTrain = myDistribution.getSample(N)
    outputTrain = g(inputTrain)
    inputTest = myDistribution.getSample(n_valid)
    outputTest = g(inputTest)
    fig = plt.figure(figsize=(25, 4))
    for totalDegree in range(1, maxDegree + 1):
        chaosResult = ComputeSparseLeastSquaresChaos(
            inputTrain, outputTrain, multivariateBasis, totalDegree, myDistribution
        )
        graph = printChaosStats(
            multivariateBasis, chaosResult, inputTest, outputTest, totalDegree
        )
        ax = fig.add_subplot(1, maxDegree, totalDegree)
        _ = ot.viewer.View(graph, figure=fig, axes=[ax])
        plt.suptitle("Metamodel validation")


.. image-sg:: /auto_meta_modeling/polynomial_chaos_metamodel/images/sphx_glr_plot_chaos_beam_sensitivity_degree_001.svg
   :alt: Metamodel validation
   :srcset: /auto_meta_modeling/polynomial_chaos_metamodel/images/sphx_glr_plot_chaos_beam_sensitivity_degree_001.svg
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Degree=1, R2=96.50%, Sparsity=20.00%
    Degree=2, R2=93.57%, Sparsity=53.33%
    Degree=3, R2=85.53%, Sparsity=80.00%
    Degree=4, R2=85.54%, Sparsity=88.57%


.. GENERATED FROM PYTHON SOURCE LINES 204-211

We see that when the degree of the polynomial increases, the :math:`R^2` coefficient decreases.
We also see that the sparsity rate increases: while the basis size grows rapidly with the degree, the algorithm selects a smaller fraction of this basis.
This shows that the algorithm performs its task of selecting relevant coefficients.
However, this selection does not seem to be sufficient to mitigate the large number of coefficients.

Of course, this example is designed to make a predictivity decrease gradually.
We are going to see that this situation is actually easy to reproduce.

.. GENERATED FROM PYTHON SOURCE LINES 213-215

Distribution of the predictivity coefficient
--------------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 217-218

Let us repeat the following experiment to see the variability of the :math:`R^2` coefficient.

.. GENERATED FROM PYTHON SOURCE LINES 221-244

.. code-block:: Python

    def computeSampleR2(N, n_valid, numberAttempts, maxDegree):
        """For a given sample size N, for degree from 1 to maxDegree,
        repeat the following experiment numberAttempts times:
        create a sparse least squares chaos and compute the R2
        using n_valid points.
        """
        r2Sample = ot.Sample(numberAttempts, maxDegree)
        for totalDegree in range(1, maxDegree + 1):
            print("Degree = %d" % (totalDegree))
            for i in range(numberAttempts):
                inputTrain = myDistribution.getSample(N)
                outputTrain = g(inputTrain)
                inputTest = myDistribution.getSample(n_valid)
                outputTest = g(inputTest)
                chaosResult = ComputeSparseLeastSquaresChaos(
                    inputTrain, outputTrain, multivariateBasis, totalDegree, myDistribution
                )
                r2Sample[i, totalDegree - 1] = computeR2Chaos(
                    chaosResult, inputTest, outputTest
                )
        return r2Sample


.. GENERATED FROM PYTHON SOURCE LINES 245-246

The following function uses a boxplot to see the distribution of the :math:`R^2` coefficients.

.. GENERATED FROM PYTHON SOURCE LINES 249-259

.. code-block:: Python

    def plotR2Boxplots(r2Sample, N):
        data = np.array(r2Sample)
        plt.figure()
        plt.boxplot(data)
        plt.title("N=%d" % (N))
        plt.xlabel("Degree")
        plt.ylabel("R2 (%)")
        return


.. GENERATED FROM PYTHON SOURCE LINES 260-261

Each experiment is repeated several times.

.. GENERATED FROM PYTHON SOURCE LINES 263-265

.. code-block:: Python

    numberAttempts = 50  # Number of repetitions


.. GENERATED FROM PYTHON SOURCE LINES 266-270

.. code-block:: Python

    N = 20  # size of the train design
    r2Sample = computeSampleR2(N, n_valid, numberAttempts, maxDegree)
    plotR2Boxplots(r2Sample, N)


.. image-sg:: /auto_meta_modeling/polynomial_chaos_metamodel/images/sphx_glr_plot_chaos_beam_sensitivity_degree_002.svg
   :alt: N=20
   :srcset: /auto_meta_modeling/polynomial_chaos_metamodel/images/sphx_glr_plot_chaos_beam_sensitivity_degree_002.svg
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Degree = 1
    Degree = 2
    Degree = 3
    Degree = 4


.. GENERATED FROM PYTHON SOURCE LINES 271-272

We see that when the size of the design of experiments is as small as 20, it is more appropriate to use a very low degree polynomial. Here 1 performs best and 4 is risky.

.. GENERATED FROM PYTHON SOURCE LINES 274-278

.. code-block:: Python

    N = 30  # size of the train design
    r2Sample = computeSampleR2(N, n_valid, numberAttempts, maxDegree)
    plotR2Boxplots(r2Sample, N)


.. image-sg:: /auto_meta_modeling/polynomial_chaos_metamodel/images/sphx_glr_plot_chaos_beam_sensitivity_degree_003.svg
   :alt: N=30
   :srcset: /auto_meta_modeling/polynomial_chaos_metamodel/images/sphx_glr_plot_chaos_beam_sensitivity_degree_003.svg
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Degree = 1
    Degree = 2
    Degree = 3
    Degree = 4


.. GENERATED FROM PYTHON SOURCE LINES 279-280

With a 30-point design set, a polynomial degree of 2 is usually advisable.

.. GENERATED FROM PYTHON SOURCE LINES 282-288

.. code-block:: Python

    N = 50  # size of the train design
    r2Sample = computeSampleR2(N, n_valid, numberAttempts, maxDegree)
    plotR2Boxplots(r2Sample, N)

    plt.show()


.. image-sg:: /auto_meta_modeling/polynomial_chaos_metamodel/images/sphx_glr_plot_chaos_beam_sensitivity_degree_004.svg
   :alt: N=50
   :srcset: /auto_meta_modeling/polynomial_chaos_metamodel/images/sphx_glr_plot_chaos_beam_sensitivity_degree_004.svg
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Degree = 1
    Degree = 2
    Degree = 3
    Degree = 4


.. GENERATED FROM PYTHON SOURCE LINES 289-290

When the sample size increases, the :math:`R^2` computation becomes less sensitive to the polynomial degree.

.. GENERATED FROM PYTHON SOURCE LINES 292-310

Conclusion
----------

We observe that on the cantilever beam example, to use a polynomial total
degree equal to 4, we need a sample size at least equal to 50 to get a
satisfactory and reproducible :math:`R^2` .
When the degree is equal to 4, if the sample size is small, then depending on the
particular sample, the predictivity coefficient can be very low (i.e., less than 0.5).
With a sample size as small as 20, a polynomial degree of 1 is safer.
However the limited sample size may have an impact on other statistics that
could be derived from a metamodel calculated on such a small training sample.

References
----------

* "Metamodel-Based Sensitivity Analysis: Polynomial Chaos Expansions and Gaussian Processes", Loïc Le Gratiet, Stefano Marelli, Bruno Sudret,
  Handbook of Uncertainty Quantification, 2017, Springer International Publishing.


.. _sphx_glr_download_auto_meta_modeling_polynomial_chaos_metamodel_plot_chaos_beam_sensitivity_degree.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_chaos_beam_sensitivity_degree.ipynb <plot_chaos_beam_sensitivity_degree.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_chaos_beam_sensitivity_degree.py <plot_chaos_beam_sensitivity_degree.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_chaos_beam_sensitivity_degree.zip <plot_chaos_beam_sensitivity_degree.zip>`