.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here ` to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_auto_data_analysis_manage_data_and_samples_plot_linear_regression.py:
Build and validate a linear model
=================================
In this example we are going to build a linear regression model and validate it numerically and graphically.
The linear model between links a scalar variable :math:`Y` and to an n-dimensional one :math:`\underline{X} = (X_i)_{i \leq n}`, as follows:
.. math::
\tilde{Y} = a_0 + \sum_{i=1}^n a_i X_i + \varepsilon
where :math:`\varepsilon` is the residual, supposed to follow the Normal(0.0, 1.0) distribution.
The linear model may be validated graphically if :math:`\underline{X}` is of dimension 1, by drawing on the same graph the cloud :math:`(X_i, Y_i)`.
The linear model also be validate numerically with several tests:
- LinearModelFisher: tests the nullity of the regression linear model coefficients (Fisher distribution used),
- LinearModelResidualMean: tests, under the hypothesis of a gaussian sample, if the mean of the residual is equal to zero. It is based on the Student test (equality of mean for two gaussian samples).
The hypothesis on the residuals (centered gaussian distribution) may be validated:
- graphically if :math:`\underline{X}` is of dimension 1, by drawing the residual couples (:math:`\varepsilon_i, \varepsilon_{i+1}`), where the residual :math:`\varepsilon_i` is evaluated on the samples :math:`(X, Y)`.
- numerically with the LinearModelResidualMean Test which tests, under the hypothesis of a gaussian sample, if the mean of the residual is equal to zero. It is based on the Student test (equality of mean for two gaussian samples).
.. code-block:: default
from __future__ import print_function
import openturns as ot
import openturns.viewer as viewer
from matplotlib import pylab as plt
ot.Log.Show(ot.Log.NONE)
Generate X,Y samples
.. code-block:: default
N = 1000
Xsample = ot.Triangular(1.0, 5.0, 10.0).getSample(N)
Ysample = Xsample * 3.0 + ot.Normal(0.5, 1.0).getSample(N)
Generate a particular scalar sampleX
.. code-block:: default
particularXSample = ot.Triangular(1.0, 5.0, 10.0).getSample(N)
Create the linear model from Y,X samples
.. code-block:: default
result = ot.LinearModelAlgorithm(Xsample, Ysample).getResult()
# Get the coefficients ai
print("coefficients of the linear regression model = ", result.getCoefficients())
# Get the confidence intervals of the ai coefficients
print("confidence intervals of the coefficients = ", ot.LinearModelAnalysis(result).getCoefficientsConfidenceInterval(0.9))
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
coefficients of the linear regression model = [0.620986,2.98488]
confidence intervals of the coefficients = [0.464408, 0.777565]
[2.95727, 3.0125]
Validate the model with a visual test
.. code-block:: default
graph = ot.VisualTest.DrawLinearModel(Xsample, Ysample, result)
view = viewer.View(graph)
.. image:: /auto_data_analysis/manage_data_and_samples/images/sphx_glr_plot_linear_regression_001.png
:alt: Linear model visual test
:class: sphx-glr-single-img
Draw the graph of the residual values
.. code-block:: default
graph = ot.VisualTest.DrawLinearModelResidual(Xsample, Ysample, result)
view = viewer.View(graph)
.. image:: /auto_data_analysis/manage_data_and_samples/images/sphx_glr_plot_linear_regression_002.png
:alt: residual(i) versus residual(i-1)
:class: sphx-glr-single-img
Check the nullity of the regression linear model coefficients
.. code-block:: default
resultLinearModelFisher = ot.LinearModelTest.LinearModelFisher(Xsample, Ysample,
result, 0.10)
print("Test Success ? ", resultLinearModelFisher.getBinaryQualityMeasure())
print("p-value of the LinearModelFisher Test = ", resultLinearModelFisher.getPValue())
print("p-value threshold = ", resultLinearModelFisher.getThreshold())
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
Test Success ? False
p-value of the LinearModelFisher Test = 0.0
p-value threshold = 0.1
Check, under the hypothesis of a gaussian sample, if the mean of the residual is equal to zero
.. code-block:: default
resultLinearModelResidualMean = ot.LinearModelTest.LinearModelResidualMean(Xsample, Ysample,
result, 0.10)
print("Test Success ? ", resultLinearModelResidualMean.getBinaryQualityMeasure())
print("p-value of the LinearModelResidualMean Test = ", resultLinearModelResidualMean.getPValue())
print("p-value threshold = ", resultLinearModelResidualMean.getThreshold())
plt.show()
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
Test Success ? True
p-value of the LinearModelResidualMean Test = 0.9999999999998087
p-value threshold = 0.1
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 0.162 seconds)
.. _sphx_glr_download_auto_data_analysis_manage_data_and_samples_plot_linear_regression.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: plot_linear_regression.py `
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: plot_linear_regression.ipynb `
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery `_