Build and validate a linear modelΒΆ

In this example we are going to build a linear regression model and validate it numerically and graphically.

The linear model between links a scalar variable Y and to an n-dimensional one \underline{X} = (X_i)_{i \leq n}, as follows:

\tilde{Y} = a_0 + \sum_{i=1}^n a_i X_i + \varepsilon

where \varepsilon is the residual, supposed to follow the Normal(0.0, 1.0) distribution.

The linear model may be validated graphically if \underline{X} is of dimension 1, by drawing on the same graph the cloud $(X_i, Y_i).

The linear model also be validate numerically with several tests:

  • LinearModelRSquared: tests the quality of the linear regression model. It evaluates the indicator R^2 (regression variance analysis) and compares it to a level,
  • LinearModelRAdjustedSquared: tests the quality of the linear regression model. It evaluates the indicator R^2 adjusted (regression variance analysis) and compares it to a level,
  • LinearModelFisher: tests the nullity of the regression linear model coefficients (Fisher distribution used),
  • LinearModelResidualMean: tests, under the hypothesis of a gaussian sample, if the mean of the residual is equal to zero. It is based on the Student test (equality of mean for two gaussian samples).

The hypothesis on the residuals (centered gaussian distribution) may be validated:

  • graphically if \underline{X} is of dimension 1, by drawing the residual couples (\varepsilon_i, \varepsilon_{i+1}), where the residual \varepsilon_i is evaluated on the samples (X, Y).
  • numerically with the LinearModelResidualMean Test which tests, under the hypothesis of a gaussian sample, if the mean of the residual is equal to zero. It is based on the Student test (equality of mean for two gaussian samples).
In [1]:
from __future__ import print_function
import openturns as ot
In [2]:
# Generate X,Y samples
N = 1000
Xsample = ot.Triangular(1.0, 5.0, 10.0).getSample(N)
Ysample = Xsample * 3.0 + ot.Normal(0.5, 1.0).getSample(N)
In [3]:
# Generate a particular scalar sampleX
particularXSample = ot.Triangular(1.0, 5.0, 10.0).getSample(N)
In [4]:
# Create the linear model from Y,X samples
linearRegressionModel = ot.LinearModelFactory().build(Xsample, Ysample, 0.9)

# Get the coefficients ai
print("coefficients of the linear regression model = ", linearRegressionModel.getRegression())

# Get the confidence intervals of the ai coefficients
print("confidence intervals of the coefficients = ", linearRegressionModel.getConfidenceIntervals())

coefficients of the linear regression model =  [0.592409,2.98159]
confidence intervals of the coefficients =  [0.470224, 0.714594]
[2.95996, 3.00321]
In [5]:
# Validate the model with a visual test
ot.VisualTest.DrawLinearModel(Xsample, Ysample, linearRegressionModel)
Out[5]:
../../_images/examples_data_analysis_linear_regression_6_0.svg
In [6]:
# Draw the graph of the residual values
ot.VisualTest.DrawLinearModelResidual(Xsample, Ysample, linearRegressionModel)
Out[6]:
../../_images/examples_data_analysis_linear_regression_7_0.svg
In [7]:
# Check the quality of the linear regression model through the R^2 indicator
resultLinearModelRSquared = ot.LinearModelTest.LinearModelRSquared(Xsample, Ysample, linearRegressionModel, 0.90)
print("Test Success ? ", resultLinearModelRSquared.getBinaryQualityMeasure())
print("p-value of the LinearModelRSquared Test = ", resultLinearModelRSquared.getPValue())# R^2 value
print("p-value threshold = ", resultLinearModelRSquared.getThreshold())# R^2 @ level
Test Success ?  True
p-value of the LinearModelRSquared Test =  0.969066896984889
p-value threshold =  0.9
In [8]:
# Check the quality of the linear regression model through the adjusted R^2 indicator
resultLinearModelAdjustedRSquared = ot.LinearModelTest.LinearModelAdjustedRSquared(Xsample, Ysample, linearRegressionModel, 0.90)
print("Test Success ? ", resultLinearModelAdjustedRSquared.getBinaryQualityMeasure())
print("p-value of the LinearModelRSquared Test = ", resultLinearModelAdjustedRSquared.getPValue())# R^2 value
print("p-value threshold = ", resultLinearModelAdjustedRSquared.getThreshold())# R^2 @ level
Test Success ?  True
p-value of the LinearModelRSquared Test =  0.9690359018916874
p-value threshold =  0.9
In [9]:
# Check the nullity of the regression linear model coefficients
resultLinearModelFisher = ot.LinearModelTest.LinearModelFisher(Xsample, Ysample, linearRegressionModel, 0.90)
print("Test Success ? ", resultLinearModelFisher.getBinaryQualityMeasure())
print("p-value of the LinearModelRSquared Test = ", resultLinearModelFisher.getPValue())
print("p-value threshold = ", resultLinearModelFisher.getThreshold())
Test Success ?  False
p-value of the LinearModelRSquared Test =  1.0
p-value threshold =  0.09999999999999998
In [10]:
# Check, under the hypothesis of a gaussian sample, if the mean of the residual is equal to zero
resultLinearModelResidualMean = ot.LinearModelTest.LinearModelResidualMean(Xsample, Ysample, linearRegressionModel, 0.90)
print("Test Success ? ", resultLinearModelResidualMean.getBinaryQualityMeasure())
print("p-value of the LinearModelRSquared Test = ", resultLinearModelResidualMean.getPValue())
print("p-value threshold = ", resultLinearModelResidualMean.getThreshold())
Test Success ?  True
p-value of the LinearModelRSquared Test =  0.9999999999997247
p-value threshold =  0.09999999999999998