.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_meta_modeling/general_purpose_metamodels/plot_linear_model.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_meta_modeling_general_purpose_metamodels_plot_linear_model.py: Create a linear model ===================== In this example we create a surrogate model using linear model approximation. .. GENERATED FROM PYTHON SOURCE LINES 7-10 The following 2-dimensional function is used in this example :math:`h(x,y) = 2x - y + 3 + 0.05 \sin(0.8x)`. .. GENERATED FROM PYTHON SOURCE LINES 12-15 .. code-block:: Python import openturns as ot import openturns.viewer as viewer .. GENERATED FROM PYTHON SOURCE LINES 16-20 Generation of the data set -------------------------- We first generate the data and we add noise to the output observations: .. GENERATED FROM PYTHON SOURCE LINES 22-30 .. code-block:: Python ot.RandomGenerator.SetSeed(0) distribution = ot.Normal(2) distribution.setDescription(["x", "y"]) func = ot.SymbolicFunction(["x", "y"], ["2 * x - y + 3 + 0.05 * sin(0.8*x)"]) input_sample = distribution.getSample(30) epsilon = ot.Normal(0, 0.1).getSample(30) output_sample = func(input_sample) + epsilon .. GENERATED FROM PYTHON SOURCE LINES 31-35 Linear regression ----------------- Let us run the linear model algorithm using the `LinearModelAlgorithm` class and get the associated results : .. GENERATED FROM PYTHON SOURCE LINES 37-40 .. code-block:: Python algo = ot.LinearModelAlgorithm(input_sample, output_sample) result = algo.getResult() .. GENERATED FROM PYTHON SOURCE LINES 41-46 Residuals analysis ------------------ We can now analyse the residuals of the regression on the training data. For clarity purposes, only the first 5 residual values are printed. .. GENERATED FROM PYTHON SOURCE LINES 48-51 .. code-block:: Python residuals = result.getSampleResiduals() print(residuals[:5]) .. rst-class:: sphx-glr-script-out .. code-block:: none [ y0 ] 0 : [ 0.186748 ] 1 : [ -0.117266 ] 2 : [ -0.039708 ] 3 : [ 0.10813 ] 4 : [ -0.0673202 ] .. GENERATED FROM PYTHON SOURCE LINES 52-53 Alternatively, the `standardized` or `studentized` residuals can be used: .. GENERATED FROM PYTHON SOURCE LINES 55-58 .. code-block:: Python stdresiduals = result.getStandardizedResiduals() print(stdresiduals[:5]) .. rst-class:: sphx-glr-script-out .. code-block:: none [ v0 ] 0 : [ 1.80775 ] 1 : [ -1.10842 ] 2 : [ -0.402104 ] 3 : [ 1.03274 ] 4 : [ -0.633913 ] .. GENERATED FROM PYTHON SOURCE LINES 59-60 Similarly, we can also obtain the underyling distribution characterizing the residuals: .. GENERATED FROM PYTHON SOURCE LINES 62-65 .. code-block:: Python print(result.getNoiseDistribution()) .. rst-class:: sphx-glr-script-out .. code-block:: none Normal(mu = 0, sigma = 0.110481) .. GENERATED FROM PYTHON SOURCE LINES 66-70 ANOVA table ----------- In order to post-process the linear regression results, the `LinearModelAnalysis` class can be used: .. GENERATED FROM PYTHON SOURCE LINES 72-75 .. code-block:: Python analysis = ot.LinearModelAnalysis(result) print(analysis) .. rst-class:: sphx-glr-script-out .. code-block:: none Basis( [[x,y]->[1],[x,y]->[x],[x,y]->[y]] ) Coefficients: | Estimate | Std Error | t value | Pr(>|t|) | -------------------------------------------------------------------- [x,y]->[1] | 2.99847 | 0.0204173 | 146.859 | 9.82341e-41 | [x,y]->[x] | 2.02079 | 0.0210897 | 95.8186 | 9.76973e-36 | [x,y]->[y] | -0.994327 | 0.0215911 | -46.0527 | 3.35854e-27 | -------------------------------------------------------------------- Residual standard error: 0.11048 on 27 degrees of freedom F-statistic: 5566.3 , p-value: 0 --------------------------------- Multiple R-squared | 0.997581 | Adjusted R-squared | 0.997401 | --------------------------------- --------------------------------- Normality test | p-value | --------------------------------- Anderson-Darling | 0.456553 | Cramer-Von Mises | 0.367709 | Chi-Squared | 0.669183 | Kolmogorov-Smirnov | 0.578427 | --------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 76-80 The results seem to indicate that the linear hypothesis can be accepted. Indeed, the `R-Squared` value is nearly `1`. Furthermore, the adjusted value, which takes into account the data set size and the number of hyperparameters, is similar to `R-Squared`. We can also notice that the `Fisher-Snedecor` and `Student` p-values detailed above are lower than 1%. This ensures an acceptable quality of the linear model. .. GENERATED FROM PYTHON SOURCE LINES 82-86 Graphical analyses ------------------ Let us compare model and fitted values: .. GENERATED FROM PYTHON SOURCE LINES 88-91 .. code-block:: Python graph = analysis.drawModelVsFitted() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_001.png :alt: Model vs Fitted :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 92-93 The previous figure seems to indicate that the linearity hypothesis is accurate. .. GENERATED FROM PYTHON SOURCE LINES 95-96 Residuals can be plotted against the fitted values. .. GENERATED FROM PYTHON SOURCE LINES 98-101 .. code-block:: Python graph = analysis.drawResidualsVsFitted() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_002.png :alt: Residuals vs Fitted :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 102-105 .. code-block:: Python graph = analysis.drawScaleLocation() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_003.png :alt: Scale-Location :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 106-109 .. code-block:: Python graph = analysis.drawQQplot() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_004.png :alt: Normal Q-Q :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 110-113 In this case, the two distributions are very close: there is no obvious outlier. Cook's distance measures the impact of every individual data point on the linear regression, and can be plotted as follows: .. GENERATED FROM PYTHON SOURCE LINES 115-118 .. code-block:: Python graph = analysis.drawCookDistance() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_005.png :alt: Cook's distance :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_005.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 119-123 This graph shows us the index of the points with disproportionate influence. One of the components of the computation of Cook's distance at a given point is that point's *leverage*. High-leverage points are far from their closest neighbors, so the fitted linear regression model must pass close to them. .. GENERATED FROM PYTHON SOURCE LINES 125-128 .. code-block:: Python graph = analysis.drawResidualsVsLeverages() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_006.png :alt: Residuals vs Leverage :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_006.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 129-132 In this case, there seem to be no obvious influential outlier characterized by large leverage and residual values, as is also shown in the figure below: Similarly, we can also plot Cook's distances as a function of the sample leverages: .. GENERATED FROM PYTHON SOURCE LINES 134-137 .. code-block:: Python graph = analysis.drawCookVsLeverages() view = viewer.View(graph) .. image-sg:: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_007.png :alt: Cook's dist vs Leverage h[ii]/(1-h[ii]) :srcset: /auto_meta_modeling/general_purpose_metamodels/images/sphx_glr_plot_linear_model_007.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 138-139 Finally, we give the intervals for each estimated coefficient (95% confidence interval): .. GENERATED FROM PYTHON SOURCE LINES 141-145 .. code-block:: Python alpha = 0.95 interval = analysis.getCoefficientsConfidenceInterval(alpha) print("confidence intervals with level=%1.2f : " % (alpha)) print("%s" % (interval)) .. rst-class:: sphx-glr-script-out .. code-block:: none confidence intervals with level=0.95 : [2.95657, 3.04036] [1.97751, 2.06406] [-1.03863, -0.950026] .. _sphx_glr_download_auto_meta_modeling_general_purpose_metamodels_plot_linear_model.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_linear_model.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_linear_model.py `