.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_meta_modeling/general_purpose_metamodels/plot_linear_model.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_meta_modeling_general_purpose_metamodels_plot_linear_model.py: Create a linear model ===================== In this example we create a surrogate model using linear model approximation using the :class:`~openturns.LinearModelAlgorithm` class. We show how the :class:`~openturns.LinearModelAnalysis` class can be used to produce the statistical analysis of the least squares regression model. .. GENERATED FROM PYTHON SOURCE LINES 11-42 Introduction ~~~~~~~~~~~~ The following 2-dimensional function is used in this example: .. math:: \model(x,y) = 3 + 2x - y for any :math:`x, y \in \Rset`. Notice that this model is linear: .. math:: \model(x,y) = \beta_1 + \beta_2 x + \beta_3 y where :math:`\beta_1 = 3`, :math:`\beta_2 = 2` and :math:`\beta_3 = -1`. We consider noisy observations of the output: .. math:: Y = \model(x,y) + \epsilon where :math:`\epsilon \sim \cN(0, \sigma^2)` with :math:`\sigma > 0` is the standard deviation. In our example, we use :math:`\sigma = 0.1`. Finally, we use :math:`n = 1000` independent observations of the model. .. GENERATED FROM PYTHON SOURCE LINES 44-47 .. code-block:: Python import openturns as ot import openturns.viewer as viewer .. GENERATED FROM PYTHON SOURCE LINES 48-52 Simulate the data set ~~~~~~~~~~~~~~~~~~~~~ We first generate the data and we add noise to the output observations: .. GENERATED FROM PYTHON SOURCE LINES 54-63 .. code-block:: Python ot.RandomGenerator.SetSeed(0) distribution = ot.Normal(2) distribution.setDescription(["x", "y"]) sampleSize = 1000 func = ot.SymbolicFunction(["x", "y"], ["3 + 2 * x - y"]) input_sample = distribution.getSample(sampleSize) epsilon = ot.Normal(0, 0.1).getSample(sampleSize) output_sample = func(input_sample) + epsilon .. GENERATED FROM PYTHON SOURCE LINES 64-70 Linear regression ~~~~~~~~~~~~~~~~~ Let us run the linear model algorithm using the :class:`~openturns.LinearModelAlgorithm` class and get the associated results: .. GENERATED FROM PYTHON SOURCE LINES 72-75 .. code-block:: Python algo = ot.LinearModelAlgorithm(input_sample, output_sample) result = algo.getResult() .. GENERATED FROM PYTHON SOURCE LINES 76-81 Residuals analysis ~~~~~~~~~~~~~~~~~~ We can now analyse the residuals of the regression on the training data. For clarity purposes, only the first 5 residual values are printed. .. GENERATED FROM PYTHON SOURCE LINES 83-86 .. code-block:: Python residuals = result.getSampleResiduals() residuals[:5] .. raw:: html
y0
0-0.005331569
1-0.06377852
2-0.06898465
30.03092278
4-0.1095064


.. GENERATED FROM PYTHON SOURCE LINES 87-88 Alternatively, the `standardized` or `studentized` residuals can be used: .. GENERATED FROM PYTHON SOURCE LINES 90-93 .. code-block:: Python stdresiduals = result.getStandardizedResiduals() stdresiduals[:5] .. raw:: html
v0
0-0.05415404
1-0.6476807
2-0.7017064
30.314112
4-1.111888


.. GENERATED FROM PYTHON SOURCE LINES 94-96 Similarly, we can also obtain the underyling distribution characterizing the residuals: .. GENERATED FROM PYTHON SOURCE LINES 98-101 .. code-block:: Python result.getNoiseDistribution() .. raw:: html
Normal


.. GENERATED FROM PYTHON SOURCE LINES 102-107 Analysis of the results ~~~~~~~~~~~~~~~~~~~~~~~ In order to post-process the linear regression results, the :class:`~openturns.LinearModelAnalysis` class can be used: .. GENERATED FROM PYTHON SOURCE LINES 109-112 .. code-block:: Python analysis = ot.LinearModelAnalysis(result) analysis .. raw:: html
Basis
Basis( [[x,y]->[1],[x,y]->[x],[x,y]->[y]] )
Coefficients
Index Function Estimate Std Error t value Pr(>|t|)
0 [x,y]->[1] 3.0047e+00 3.1192e-03 9.6327e+02 0.0
1 [x,y]->[x] 1.9999e+00 3.1646e-03 6.3197e+02 0.0
2 [x,y]->[y] -9.9783e-01 3.1193e-03 -3.1989e+02 0.0

Normality test of the residuals
Normality test p-value
Anderson-Darling 0.0865
ChiSquared 0.0831
Kolmogorov-Smirnov 0.3898
Cramer-Von Mises 0.0499


.. GENERATED FROM PYTHON SOURCE LINES 113-135 The results seem to indicate that the linear model is satisfactory. - The basis uses the three functions :math:`1` (i.e. the intercept), :math:`x` and :math:`y`. - Each row of the table of coefficients tests if one single coefficient is zero. The probability of observing a large value of the T statistics is close to zero for all coefficients. Hence, we can reject the hypothesis that any coefficient is zero. In other words, all the coefficients are significantly nonzero. - The R2 score is close to 1. Furthermore, the adjusted R2 value, which takes into account the size of the data set and the number of hyperparameters, is similar to the unadjusted R2 score. Most of the variance is explained by the linear model. - The F-test tests if all coefficients are simultaneously zero. The `Fisher-Snedecor` p-value is lower than 1%. Hence, there is at least one nonzero coefficient. - The normality test checks that the residuals have a normal distribution. The normality assumption can be rejected if the p-value is close to zero. The p-values are larger than 0.05: the normality assumption of the residuals is not rejected. .. GENERATED FROM PYTHON SOURCE LINES 137-141 Graphical analyses ~~~~~~~~~~~~~~~~~~ Let us compare model and fitted values: .. GENERATED FROM PYTHON SOURCE LINES 143-146 .. code-block:: Python graph = analysis.drawModelVsFitted() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_001.png :alt: Model vs Fitted :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 147-149 The previous figure seems to indicate that the linearity hypothesis is accurate. .. GENERATED FROM PYTHON SOURCE LINES 151-152 Residuals can be plotted against the fitted values. .. GENERATED FROM PYTHON SOURCE LINES 154-157 .. code-block:: Python graph = analysis.drawResidualsVsFitted() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_002.png :alt: Residuals vs Fitted :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 158-161 .. code-block:: Python graph = analysis.drawScaleLocation() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_003.png :alt: Scale-Location :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 162-165 .. code-block:: Python graph = analysis.drawQQplot() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_004.png :alt: Normal Q-Q :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 166-171 In this case, the two distributions are very close: there is no obvious outlier. Cook's distance measures the impact of every individual data point on the linear regression, and can be plotted as follows: .. GENERATED FROM PYTHON SOURCE LINES 173-176 .. code-block:: Python graph = analysis.drawCookDistance() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_005.png :alt: Cook's distance :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_005.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 177-183 This graph shows us the index of the points with disproportionate influence. One of the components of the computation of Cook's distance at a given point is that point's *leverage*. High-leverage points are far from their closest neighbors, so the fitted linear regression model must pass close to them. .. GENERATED FROM PYTHON SOURCE LINES 183-188 .. code-block:: Python # sphinx_gallery_thumbnail_number = 6 graph = analysis.drawResidualsVsLeverages() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_006.png :alt: Residuals vs Leverage :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_006.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 189-194 In this case, there seem to be no obvious influential outlier characterized by large leverage and residual values. Similarly, we can also plot Cook's distances as a function of the sample leverages: .. GENERATED FROM PYTHON SOURCE LINES 196-199 .. code-block:: Python graph = analysis.drawCookVsLeverages() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_007.png :alt: Cook's dist vs Leverage h[ii]/(1-h[ii]) :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_007.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 200-202 Finally, we give the intervals for each estimated coefficient (95% confidence interval): .. GENERATED FROM PYTHON SOURCE LINES 204-208 .. code-block:: Python alpha = 0.95 interval = analysis.getCoefficientsConfidenceInterval(alpha) print("confidence intervals with level=%1.2f: " % (alpha)) print("%s" % (interval)) .. rst-class:: sphx-glr-script-out .. code-block:: none confidence intervals with level=0.95: [2.99855, 3.01079] [1.99374, 2.00616] [-1.00395, -0.991707] .. _sphx_glr_download_auto_meta_modeling_general_purpose_metamodels_plot_linear_model.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_linear_model.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_linear_model.py `