.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_meta_modeling/general_purpose_metamodels/plot_linear_model.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_meta_modeling_general_purpose_metamodels_plot_linear_model.py: Create a linear model ===================== In this example we create a surrogate model using linear model approximation with the :class:`~openturns.LinearModelAlgorithm` class. We show how the :class:`~openturns.LinearModelAnalysis` class can be used to produce the statistical analysis of the least squares regression model. .. GENERATED FROM PYTHON SOURCE LINES 12-43 Introduction ~~~~~~~~~~~~ The following 2-d function is used in this example: .. math:: \model(x,y) = 3 + 2x - y for any :math:`x, y \in \Rset`. Notice that this model is linear: .. math:: \model(x,y) = \beta_1 + \beta_2 x + \beta_3 y where :math:`\beta_1 = 3`, :math:`\beta_2 = 2` and :math:`\beta_3 = -1`. We consider noisy observations of the output: .. math:: Y = \model(x,y) + \epsilon where :math:`\epsilon \sim \cN(0, \sigma^2)` with :math:`\sigma > 0` is the standard deviation. In our example, we use :math:`\sigma = 0.1`. Finally, we use :math:`n = 1000` independent observations of the model. .. GENERATED FROM PYTHON SOURCE LINES 45-48 .. code-block:: Python import openturns as ot import openturns.viewer as viewer .. GENERATED FROM PYTHON SOURCE LINES 49-53 Simulate the data set ~~~~~~~~~~~~~~~~~~~~~ We first generate the data and we add noise to the output observations: .. GENERATED FROM PYTHON SOURCE LINES 55-64 .. code-block:: Python ot.RandomGenerator.SetSeed(0) distribution = ot.Normal(2) distribution.setDescription(["x", "y"]) sampleSize = 1000 func = ot.SymbolicFunction(["x", "y"], ["3 + 2 * x - y"]) input_sample = distribution.getSample(sampleSize) epsilon = ot.Normal(0, 0.1).getSample(sampleSize) output_sample = func(input_sample) + epsilon .. GENERATED FROM PYTHON SOURCE LINES 65-70 Linear regression ~~~~~~~~~~~~~~~~~ Let us run the linear model algorithm using the :class:`~openturns.LinearModelAlgorithm` class and get the associated results: .. GENERATED FROM PYTHON SOURCE LINES 72-75 .. code-block:: Python algo = ot.LinearModelAlgorithm(input_sample, output_sample) result = algo.getResult() .. GENERATED FROM PYTHON SOURCE LINES 76-81 Residuals analysis ~~~~~~~~~~~~~~~~~~ We can now analyze the residuals of the regression on the training data. For clarity purposes, only the first 5 residual values are printed. .. GENERATED FROM PYTHON SOURCE LINES 83-86 .. code-block:: Python residuals = result.getSampleResiduals() residuals[:5] .. raw:: html

	y0
0	-0.005331569
1	-0.06377852
2	-0.06898465
3	0.03092278
4	-0.1095064

.. GENERATED FROM PYTHON SOURCE LINES 87-88 Alternatively, the `standardized` or `studentized` residuals can be used: .. GENERATED FROM PYTHON SOURCE LINES 90-93 .. code-block:: Python stdresiduals = result.getStandardizedResiduals() stdresiduals[:5] .. raw:: html

	v0
0	-0.05415404
1	-0.6476807
2	-0.7017064
3	0.314112
4	-1.111888

.. GENERATED FROM PYTHON SOURCE LINES 94-96 Similarly, we can also obtain the underyling distribution characterizing the residuals: .. GENERATED FROM PYTHON SOURCE LINES 98-101 .. code-block:: Python result.getNoiseDistribution() .. raw:: html

Normal

name=Normal
dimension=1
weight=1
range=]-inf (-0.754368), (0.754368) +inf[
description=[X0]
isParallel=true
isCopula=false

.. GENERATED FROM PYTHON SOURCE LINES 102-107 Analysis of the results ~~~~~~~~~~~~~~~~~~~~~~~ In order to post-process the linear regression results, the :class:`~openturns.LinearModelAnalysis` class can be used: .. GENERATED FROM PYTHON SOURCE LINES 109-112 .. code-block:: Python analysis = ot.LinearModelAnalysis(result) analysis .. raw:: html

Basis
Basis( [[x,y]->[1],[x,y]->[x],[x,y]->[y]] )
Coefficients

Index	Function	Estimate	Std Error	t value
0	[x,y]->[1]	3.0047e+00	3.1192e-03	9.6327e+02
1	[x,y]->[x]	1.9999e+00	3.1646e-03	6.3197e+02
2	[x,y]->[y]	-9.9783e-01	3.1193e-03	-3.1989e+02

Residual standard error: 9.8602e-02 on 997 degrees of freedom
F-statistic: 2.4883e+05, p-value: 0.0
Multiple R-squared: 0.9980
Adjusted R-squared: 0.9980

Normality test of the residuals

Normality test	p-value
Anderson-Darling	0.0865
ChiSquared	0.0831
Kolmogorov-Smirnov	0.3898
Cramer-Von Mises	0.0499

.. GENERATED FROM PYTHON SOURCE LINES 113-135 The results seem to indicate that the linear model is satisfactory. - The basis uses the three functions :math:`1` (i.e. the intercept), :math:`x` and :math:`y`. - Each row of the table of coefficients tests if one single coefficient is zero. The probability of observing a large value of the T statistics is close to zero for all coefficients. Hence, we can reject the hypothesis that any coefficient is zero. In other words, all the coefficients are significantly nonzero. - The :math:`R^2` score is close to 1. Furthermore, the adjusted :math:`R^2` value, which takes into account the size of the data set and the number of hyperparameters, is similar to the unadjusted :math:`R^2` score. Most of the variance is explained by the linear model. - The F-test tests if all coefficients are simultaneously zero. The `Fisher-Snedecor` p-value is lower than 1%. Hence, there is at least one nonzero coefficient. - The normality test checks that the residuals have a normal distribution. The normality assumption can be rejected if the p-value is close to zero. The p-values are larger than 0.05: the normality assumption of the residuals is not rejected. .. GENERATED FROM PYTHON SOURCE LINES 137-141 Graphical analyses ~~~~~~~~~~~~~~~~~~ Let us compare model and fitted values: .. GENERATED FROM PYTHON SOURCE LINES 143-146 .. code-block:: Python graph = analysis.drawModelVsFitted() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_001.png :alt: Model vs Fitted :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 147-149 The previous figure seems to indicate that the linearity hypothesis is accurate. .. GENERATED FROM PYTHON SOURCE LINES 151-152 Residuals can be plotted against the fitted values. .. GENERATED FROM PYTHON SOURCE LINES 154-157 .. code-block:: Python graph = analysis.drawResidualsVsFitted() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_002.png :alt: Residuals vs Fitted :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 158-161 .. code-block:: Python graph = analysis.drawScaleLocation() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_003.png :alt: Scale-Location :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 162-165 .. code-block:: Python graph = analysis.drawQQplot() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_004.png :alt: Normal Q-Q :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 166-171 In this case, the two distributions are very close: there is no obvious outlier. Cook's distance measures the impact of every individual data point on the linear regression, and can be plotted as follows: .. GENERATED FROM PYTHON SOURCE LINES 173-176 .. code-block:: Python graph = analysis.drawCookDistance() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_005.png :alt: Cook's distance :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_005.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 177-183 This graph shows us the index of the points with disproportionate influence. One of the components of the computation of Cook's distance at a given point is that point's *leverage*. High-leverage points are far from their closest neighbors, so the fitted linear regression model must pass close to them. .. GENERATED FROM PYTHON SOURCE LINES 183-188 .. code-block:: Python # sphinx_gallery_thumbnail_number = 6 graph = analysis.drawResidualsVsLeverages() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_006.png :alt: Residuals vs Leverage :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_006.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 189-194 In this case, there seems to be no obvious influential outlier characterized by large leverage and residual values. Similarly, we can also plot Cook's distances as a function of the sample leverages: .. GENERATED FROM PYTHON SOURCE LINES 196-199 .. code-block:: Python graph = analysis.drawCookVsLeverages() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_007.png :alt: Cook's dist vs Leverage h[ii]/(1-h[ii]) :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_007.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 200-202 Finally, we give the intervals for each estimated coefficient (95% confidence interval): .. GENERATED FROM PYTHON SOURCE LINES 204-208 .. code-block:: Python alpha = 0.95 interval = analysis.getCoefficientsConfidenceInterval(alpha) print("confidence intervals with level=%1.2f: " % (alpha)) print("%s" % (interval)) .. rst-class:: sphx-glr-script-out .. code-block:: none confidence intervals with level=0.95: [2.99855, 3.01079] [1.99374, 2.00616] [-1.00395, -0.991707] .. _sphx_glr_download_auto_meta_modeling_general_purpose_metamodels_plot_linear_model.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_linear_model.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_linear_model.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_linear_model.zip `