LinearModelAnalysis¶

class LinearModelAnalysis(*args)¶

Analyse a linear model.

Parameters:

linearModelResultLinearModelResult: A linear model result.

Methods

`drawCookDistance`()	Accessor to plot of Cook's distances versus row labels.
`drawCookVsLeverages`()	Accessor to plot of Cook's distances versus leverage/(1-leverage).
`drawModelVsFitted`()	Accessor to plot of model versus fitted values.
`drawQQplot`()	Accessor to plot a Normal quantiles-quantiles plot of standardized residuals.
`drawResidualsVsFitted`()	Accessor to plot of residuals versus fitted values.
`drawResidualsVsLeverages`()	Accessor to plot of residuals versus leverages that adds bands corresponding to Cook's distances of 0.5 and 1.
`drawScaleLocation`()	Accessor to a Scale-Location plot of sqrt(abs(standardized residuals)) versus fitted values.
`getClassName`()	Accessor to the object's name.
`getCoefficientsConfidenceInterval`([level])	Accessor to the confidence interval of level $\alpha$ for the coefficients of the linear expansion.
`getCoefficientsPValues`()	Accessor to the coefficients of the p values.
`getCoefficientsTScores`()	Accessor to the coefficients of linear expansion over their standard error.
`getFisherPValue`()	Accessor to the Fisher p-values.
`getFisherScore`()	Accessor to the Fisher statistics.
`getLinearModelResult`()	Accessor to the linear model result.
`getName`()	Accessor to the object's name.
`getNormalityTestCramerVonMises`()	Performs Cramer-Von Mises test.
`getNormalityTestResultAndersonDarling`()	Performs Anderson-Darling test.
`getNormalityTestResultChiSquared`()	Performs Chi-Square test.
`getNormalityTestResultKolmogorovSmirnov`()	Performs Kolmogorov test.
`getResidualsStandardError`()	Accessor to the standard error of the residuals.
`hasName`()	Test if the object is named.
`setName`(name)	Accessor to the object's name.

See also

LinearModelResult

Notes

This class relies on a linear model result structure and analyses the results.

By default, on graphs, labels of the 3 most significant points are displayed. This number can be changed by modifying the ResourceMap key (LinearModelAnalysis-IdentifiersNumber).

The class has a pretty-print method which is triggered by the print() function. This prints the following results, where we focus on the properties of a satisfactory regression model.

Each row of the table of coefficients tests if one single coefficient is zero. For a single coefficient, if the p-value of the T-test is close to zero, we can reject the hypothesis that this coefficient is zero. See getCoefficientsTScores() to get the scores and getCoefficientsPValues() to get the related p-values.
The $R^2$ score measures how the predicted output values are close to the observed values. If the $R^2$ is close to 1 (e.g. larger than 0.95), then the predictions are accurate on average. See getRSquared(). Furthermore, the adjusted $R^2$ value, denoted by $R_{ad}^2$ , takes into account the data set size and the number of hyperparameters. See getAdjustedRSquared().
The Fisher-test tests if all the coefficients are simultaneously zero. If the p-value is close to zero, then we can reject this hypothesis: there is at least one nonzero coefficient. See getFisherScore() to get the scores and getFisherPValue() to get related the p-values.
The normality tests check if the residuals are Gaussian. The normality assumption can be accepted (or, more precisely, cannot be rejected) if the p-value is larger than a threshold (e.g. 0.05). See getNormalityTestCramerVonMises(), getNormalityTestResultAndersonDarling(), getNormalityTestResultChiSquared() and getNormalityTestResultKolmogorovSmirnov().

The basics on regression theory are presented in Regression analysis. The goodness of fit tests for normality are presented in Graphical goodness-of-fit tests, Chi-squared test, The Kolmogorov-Smirnov goodness of fit test for continuous data, Cramer-Von Mises test and Anderson-Darling test.

Examples

>>> import openturns as ot
>>> ot.RandomGenerator.SetSeed(0)
>>> distribution = ot.Normal()
>>> Xsample = distribution.getSample(30)
>>> func = ot.SymbolicFunction(['x'], ['2 * x + 1'])
>>> Ysample = func(Xsample) + ot.Normal().getSample(30)
>>> algo = ot.LinearModelAlgorithm(Ysample, Xsample)
>>> result = algo.getResult()
>>> analysis = ot.LinearModelAnalysis(result)
>>> # print(analysis)  # Pretty-print

__init__(*args)¶

drawCookDistance()¶

Accessor to plot of Cook’s distances versus row labels.

Returns:

graphGraph

Notes

The graph plots the Cook distance of each experience $i$ is defined in (2). The Cook’s distance measures the impact of every individual data point on the linear regression. See [rawlings2001] (section 11.2.1, Cook’s D page 362) for more details.

drawCookVsLeverages()¶

Accessor to plot of Cook’s distances versus leverage/(1-leverage).

Returns:

graphGraph

Notes

This graph plots the Cook distance defined in (2) and the the ration $\ell_i/(1-\ell_i)$ where $\ell_i$ is the leverage of experience $i$ defined in (7).

drawModelVsFitted()¶

Accessor to plot of model versus fitted values.

Returns:

graphGraph

Notes

The graph plots the sample $(Y_i, \hat{Y}_i)$ where $Y_i$ is the real value of experience $i$ and $\hat{Y}_i$ is the value fitted by the linear model, defined in (2) or (4).

drawQQplot()¶

Accessor to plot a Normal quantiles-quantiles plot of standardized residuals.

Returns:

graphGraph

Notes

The graph plots the empirical quantiles of the standardized residuals defined in (9) versus the quantiles of the Normal distribution with zero mean and unit variance.

drawResidualsVsFitted()¶

Accessor to plot of residuals versus fitted values.

Returns:

graphGraph

Notes

The graph plots the sample $(\varepsilon_i, \hat{Y}_i)$ where $\varepsilon_i$ is the residual of experience $i$ defined in (5) and $\hat{Y}_i$ is the value fitted by the linear model, defined in (2) or (4).

drawResidualsVsLeverages()¶

Accessor to plot of residuals versus leverages that adds bands corresponding to Cook’s distances of 0.5 and 1.

Returns:

graphGraph

Notes

This graph plots the residuals $\varepsilon_i$ defined in (5) and the leverage $\ell_i$ of experience $i$ defined in (7).

drawScaleLocation()¶

Accessor to a Scale-Location plot of sqrt(abs(standardized residuals)) versus fitted values.

Returns:

graphGraph

Notes

The graph plots the sample $(\hat{Y}_i, \sqrt{|\varepsilon_i^{st}|})$ where $\varepsilon_i^{st}$ is the standardized residual of experience $i$ defined in (9) and $\hat{Y}_i$ is the value fitted by the linear model, defined in (2) or (4).

getClassName()¶

Accessor to the object’s name.

Returns:

class_namestr: The object class name (object.__class__.__name__).

getCoefficientsConfidenceInterval(level=0.95)¶

Accessor to the confidence interval of level $\alpha$ for the coefficients of the linear expansion.

Parameters:

alpĥafloat, $0 \leq \alpha \leq 1$: The confidence level $\alpha$ .

Returns:

confidenceIntervalInterval: The confidence interval.

Notes

Under the Gaussian assumption of the error, the confidence interval of the coefficient $a_k$ of level $\alpha$ is defined by:

$\left[\hat{a}_k \pm \sqrt{\left((\mat{\Psi}^t\mat{\Psi})^{-1} \right)_{k+1, k+1}}\hat{\sigma} t_{(1+\alpha)/2}\right]$

where:

$t_{(1+\alpha)/2}$ is the quantile of order $(1+\alpha)/2$ of the Student( $dof$ ) distribution,
with $dof$ the degrees of freedom defined in (3) or (4),
$\mat{\Psi}$ the design matrix defined in (5) or (6).

The interval returned is multivariate and contains the intervals of all the coefficients.

If the residuals are not Gaussian, this test is not appropriate and should not be used.

getCoefficientsPValues()¶

Accessor to the coefficients of the p values.

Returns:

pValuesPoint: Student P-values of the coefficient estimates.

Notes

The T-test checks if the coefficient :math`hat{a}_k` is statistically different from zero and is used under the Gaussian assumption of the error $\varepsilon$ .

The p-values of each coefficient estimate is computed from the t-scores defined in (1) with respect to the Student distribution with $dof$ degrees of freedom defined in (4) or (3).

These p-values are used under the Gaussian assumption of the error $\varepsilon$ . If the p-value is close to zero, we can reject the hypothesis that this coefficient is zero.

If the residuals are not Gaussian, this test is not appropriate and should not be used.

getCoefficientsTScores()¶

Accessor to the coefficients of linear expansion over their standard error.

Returns:

tScoresPoint: The Student score of each coefficient estimate $\hat{\vect{a}}$ .

Notes

The T-test checks if the coefficient :math`hat{a}_k` is statistically different from zero and is used under the Gaussian assumption of the error $\varepsilon$ . See [rawlings2001] (section 4.5.2 Special cases of the general form page 122) for more details.

For each coefficient estimate $\hat{\vect{a}}$ , the Student score $t_k$ is computed as:

(1)¶ $t_k = \dfrac{\hat{a}_k}{\sigma(a_k)}$

where $\sigma(a_k)$ is the standard deviation of the distribution of the estimator $\hat{a}_k$ defined in (1).

getFisherPValue()¶

Accessor to the Fisher p-values.

Returns:

fisherPValuefloat: Fisher P-value of the model estimate.

Notes

The F-test tests if all the coefficients are simultaneously equal to zero and is used under the Gaussian assumption of the error $\varepsilon$ .

The Fisher p-value of each coefficient estimate is computed from the Fisher score defined in (2) with respect to the FisherSnedecor distribution parameterized by $(dofM, dof)$ where:

$dofM$ is the degrees of freedom of the model (equal to the number of coefficients to estimate ( $p+1$ of $p'$ ). If the basis contains an intercept, then we subtract 1 from $dofM$ .
$dof$ is the degrees of freedom defined in (3) or (4).

This p-value is used under the Gaussian assumption of the error $\varepsilon$ . It tests if all the coefficients are statistically useful to the model. If the p-value is close to zero, then we can reject this hypothesis: there is at least one nonzero coefficient.

If the residuals are not Gaussian, this test is not appropriate and should not be used.

getFisherScore()¶

Accessor to the Fisher statistics.

Returns:

fisherScorefloat: The Fisher score of the model.

Notes

The Fisher-test tests if all the coefficients are simultaneously equal to zero and is used under the Gaussian assumption of the error $\varepsilon$ .

The Fisher score is computed as follows. Let be $dofM$ the degrees of freedom of the model, equal to the number of coefficients to estimate ( $p+1$ of $p'$ ). If the basis contains an intercept, then we subtract 1 from $dofM$ .

Let $dof$ be the degrees of freedom defined in (3) or (4).

Let SSR be the Sum of Squared Residuals (sometimes called SSE as Sum of Squared Errors) defined by:

$SSR = \sum_{i=1}^\sampleSize \varepsilon_i^2$

Let SST be the Sum of Squared Total defined by:

$SST = \left| \begin{array}{ll} \sum_{i=1}^\sampleSize (Y_i - \bar{Y}_i)^2 & \mbox{if the basis contains an intercept} \\ \sum_{i=1}^\sampleSize Y_i^2 & \mbox{if not.} \end{array} \right.$

where $\bar{Y}_i = \dfrac{1}{\sampleSize} Y_i$ .

We denote by SSM the Sum of Squared Model defined by:

SSM = SST - SSR

Then, the Fisher score $f$ is defined by:

(2)¶ $f = \dfrac{SSM/dofM}{SSE/dof}$

getLinearModelResult()¶

Accessor to the linear model result.

Returns:

linearModelResultLinearModelResult: The linear model result which has been passed to the constructor.

getName()¶

Accessor to the object’s name.

Returns:

namestr: The name of the object.

getNormalityTestCramerVonMises()¶

Performs Cramer-Von Mises test.

Returns:

testResultTestResult: Test result class.

Notes

We check if the residuals are Gaussian thanks to CramerVonMisesNormal().

getNormalityTestResultAndersonDarling()¶

Performs Anderson-Darling test.

Returns:

testResultTestResult: Test result class.

Notes

We check if the residuals are Gaussian thanks to AndersonDarlingNormal().

getNormalityTestResultChiSquared()¶

Performs Chi-Square test.

Returns:

testResultTestResult: Test result class.

Notes

The Chi-Square test is a goodness of fit test which objective is to check the normality assumption (null hypothesis) of residuals (and thus the model).

Usually, Chi-Square test applies for discrete distributions. Here we rely on the ChiSquared() to check the normality.

getNormalityTestResultKolmogorovSmirnov()¶

Performs Kolmogorov test.

Returns:

testResultTestResult: Test result class.

Notes

We check if the residuals are Gaussian thanks to Kolmogorov().

getResidualsStandardError()¶

Accessor to the standard error of the residuals.

Returns:

stdErrorfloat: The residuals standard deviation estimate.

Notes

The standard error is also called the root mean squared error or the standard error of regression. It is the residual standard deviation $\hat{\sigma}$ defined in (8) which is the unbiaised residuals variance.

hasName()¶

Test if the object is named.

Returns: