LinearModelAnalysis¶
- class LinearModelAnalysis(*args)¶
- Analyse a linear model. - Parameters:
- linearModelResultLinearModelResult
- A linear model result. 
 
- linearModelResult
 - Methods - Accessor to plot of Cook's distances versus row labels. - Accessor to plot of Cook's distances versus leverage/(1-leverage). - Accessor to plot of model versus fitted values. - Accessor to plot a Normal quantiles-quantiles plot of standardized residuals. - Accessor to plot of residuals versus fitted values. - Accessor to plot of residuals versus leverages that adds bands corresponding to Cook's distances of 0.5 and 1. - Accessor to a Scale-Location plot of sqrt(abs(standardized residuals)) versus fitted values. - Accessor to the object's name. - getCoefficientsConfidenceInterval([level])- Accessor to the confidence interval of level - for the coefficients of the linear expansion. - Accessor to the coefficients of the p values. - Accessor to the coefficients of linear expansion over their standard error. - Accessor to the Fisher p-values. - Accessor to the Fisher statistics. - Accessor to the linear model result. - getName()- Accessor to the object's name. - Performs Cramer-Von Mises test. - Performs Anderson-Darling test. - Performs Chi-Square test. - Performs Kolmogorov test. - Accessor to the standard error of the residuals. - hasName()- Test if the object is named. - setName(name)- Accessor to the object's name. - See also - Notes - This class relies on a linear model result structure and analyses the results. - By default, on graphs, labels of the 3 most significant points are displayed. This number can be changed by modifying the - ResourceMapkey (- LinearModelAnalysis-IdentifiersNumber).- The class has a pretty-print method which is triggered by the print() function. This prints the following results, where we focus on the properties of a satisfactory regression model. - Each row of the table of coefficients tests if one single coefficient is zero. For a single coefficient, if the p-value of the T-test is close to zero, we can reject the hypothesis that this coefficient is zero. See - getCoefficientsTScores()to get the scores and- getCoefficientsPValues()to get the related p-values.
- The - score measures how the predicted output values are close to the observed values. If the - is close to 1 (e.g. larger than 0.95), then the predictions are accurate on average. See - getRSquared(). Furthermore, the adjusted- value, denoted by - , takes into account the data set size and the number of hyperparameters. See - getAdjustedRSquared().
- The Fisher-test tests if all the coefficients are simultaneously zero. If the p-value is close to zero, then we can reject this hypothesis: there is at least one nonzero coefficient. See - getFisherScore()to get the scores and- getFisherPValue()to get related the p-values.
- The normality tests check if the residuals are Gaussian. The normality assumption can be accepted (or, more precisely, cannot be rejected) if the p-value is larger than a threshold (e.g. 0.05). See - getNormalityTestCramerVonMises(),- getNormalityTestResultAndersonDarling(),- getNormalityTestResultChiSquared()and- getNormalityTestResultKolmogorovSmirnov().
 - The basics on regression theory are presented in Regression analysis. The goodness of fit tests for normality are presented in Graphical goodness-of-fit tests, Chi-squared test, The Kolmogorov-Smirnov goodness of fit test for continuous data, Cramer-Von Mises test and Anderson-Darling test. - Examples - >>> import openturns as ot >>> ot.RandomGenerator.SetSeed(0) >>> distribution = ot.Normal() >>> Xsample = distribution.getSample(30) >>> func = ot.SymbolicFunction(['x'], ['2 * x + 1']) >>> Ysample = func(Xsample) + ot.Normal().getSample(30) >>> algo = ot.LinearModelAlgorithm(Ysample, Xsample) >>> result = algo.getResult() >>> analysis = ot.LinearModelAnalysis(result) >>> # print(analysis) # Pretty-print - __init__(*args)¶
 - drawCookDistance()¶
- Accessor to plot of Cook’s distances versus row labels. - Returns:
- graphGraph
 
- graph
 - Notes - The graph plots the Cook distance of each experience - is defined in (2). The Cook’s distance measures the impact of every individual data point on the linear regression. See [rawlings2001] (section 11.2.1, Cook’s D page 362) for more details. 
 - drawCookVsLeverages()¶
- Accessor to plot of Cook’s distances versus leverage/(1-leverage). - Returns:
- graphGraph
 
- graph
 - Notes - This graph plots the Cook distance defined in (2) and the the ration - where - is the leverage of experience - defined in (7). 
 - drawModelVsFitted()¶
- Accessor to plot of model versus fitted values. - Returns:
- graphGraph
 
- graph
 - Notes - The graph plots the sample - where - is the real value of experience - and - is the value fitted by the linear model, defined in (2) or (4). 
 - drawQQplot()¶
- Accessor to plot a Normal quantiles-quantiles plot of standardized residuals. - Returns:
- graphGraph
 
- graph
 - Notes - The graph plots the empirical quantiles of the standardized residuals defined in (9) versus the quantiles of the Normal distribution with zero mean and unit variance. 
 - drawResidualsVsFitted()¶
- Accessor to plot of residuals versus fitted values. - Returns:
- graphGraph
 
- graph
 - Notes - The graph plots the sample - where - is the residual of experience - defined in (5) and - is the value fitted by the linear model, defined in (2) or (4). 
 - drawResidualsVsLeverages()¶
- Accessor to plot of residuals versus leverages that adds bands corresponding to Cook’s distances of 0.5 and 1. - Returns:
- graphGraph
 
- graph
 - Notes - This graph plots the residuals - defined in (5) and the leverage - of experience - defined in (7). 
 - drawScaleLocation()¶
- Accessor to a Scale-Location plot of sqrt(abs(standardized residuals)) versus fitted values. - Returns:
- graphGraph
 
- graph
 - Notes - The graph plots the sample - where - is the standardized residual of experience - defined in (9) and - is the value fitted by the linear model, defined in (2) or (4). 
 - getClassName()¶
- Accessor to the object’s name. - Returns:
- class_namestr
- The object class name (object.__class__.__name__). 
 
 
 - getCoefficientsConfidenceInterval(level=0.95)¶
- Accessor to the confidence interval of level - for the coefficients of the linear expansion. - Parameters:
- alpĥafloat, 
- The confidence level - . 
 
- alpĥafloat, 
- Returns:
- confidenceIntervalInterval
- The confidence interval. 
 
- confidenceInterval
 - Notes - Under the Gaussian assumption of the error, the confidence interval of the coefficient - of level - is defined by: - where: - is the quantile of order - of the Student( - ) distribution, 
 - The interval returned is multivariate and contains the intervals of all the coefficients. - If the residuals are not Gaussian, this test is not appropriate and should not be used. 
 - getCoefficientsPValues()¶
- Accessor to the coefficients of the p values. - Returns:
- pValuesPoint
- Student P-values of the coefficient estimates. 
 
- pValues
 - Notes - The T-test checks if the coefficient :math`hat{a}_k` is statistically different from zero and is used under the Gaussian assumption of the error - . - The p-values of each coefficient estimate is computed from the t-scores defined in (1) with respect to the - Studentdistribution with- degrees of freedom defined in (4) or (3). - These p-values are used under the Gaussian assumption of the error - . If the p-value is close to zero, we can reject the hypothesis that this coefficient is zero. - If the residuals are not Gaussian, this test is not appropriate and should not be used. 
 - getCoefficientsTScores()¶
- Accessor to the coefficients of linear expansion over their standard error. - Returns:
- tScoresPoint
- The Student score of each coefficient estimate - . 
 
- tScores
 - Notes - The T-test checks if the coefficient :math`hat{a}_k` is statistically different from zero and is used under the Gaussian assumption of the error - . See [rawlings2001] (section 4.5.2 Special cases of the general form page 122) for more details. - For each coefficient estimate - , the Student score - is computed as: - (1)¶ - where - is the standard deviation of the distribution of the estimator - defined in (1). 
 - getFisherPValue()¶
- Accessor to the Fisher p-values. - Returns:
- fisherPValuefloat
- Fisher P-value of the model estimate. 
 
 - Notes - The F-test tests if all the coefficients are simultaneously equal to zero and is used under the Gaussian assumption of the error - . - The Fisher p-value of each coefficient estimate is computed from the Fisher score defined in (2) with respect to the - FisherSnedecordistribution parameterized by- where: - is the degrees of freedom of the model (equal to the number of coefficients to estimate ( - of - ). If the basis contains an intercept, then we subtract 1 from - . 
 - This p-value is used under the Gaussian assumption of the error - . It tests if all the coefficients are statistically useful to the model. If the p-value is close to zero, then we can reject this hypothesis: there is at least one nonzero coefficient. - If the residuals are not Gaussian, this test is not appropriate and should not be used. 
 - getFisherScore()¶
- Accessor to the Fisher statistics. - Returns:
- fisherScorefloat
- The Fisher score of the model. 
 
 - Notes - The Fisher-test tests if all the coefficients are simultaneously equal to zero and is used under the Gaussian assumption of the error - . - The Fisher score is computed as follows. Let be - the degrees of freedom of the model, equal to the number of coefficients to estimate ( - of - ). If the basis contains an intercept, then we subtract 1 from - . - Let - be the degrees of freedom defined in (3) or (4). - Let SSR be the Sum of Squared Residuals (sometimes called SSE as Sum of Squared Errors) defined by: - Let SST be the Sum of Squared Total defined by: - where - . - We denote by SSM the Sum of Squared Model defined by: - Then, the Fisher score - is defined by: - (2)¶ 
 - getLinearModelResult()¶
- Accessor to the linear model result. - Returns:
- linearModelResultLinearModelResult
- The linear model result which has been passed to the constructor. 
 
- linearModelResult
 
 - getName()¶
- Accessor to the object’s name. - Returns:
- namestr
- The name of the object. 
 
 
 - getNormalityTestCramerVonMises()¶
- Performs Cramer-Von Mises test. - Returns:
- testResultTestResult
- Test result class. 
 
- testResult
 - Notes - We check if the residuals are Gaussian thanks to - CramerVonMisesNormal().
 - getNormalityTestResultAndersonDarling()¶
- Performs Anderson-Darling test. - Returns:
- testResultTestResult
- Test result class. 
 
- testResult
 - Notes - We check if the residuals are Gaussian thanks to - AndersonDarlingNormal().
 - getNormalityTestResultChiSquared()¶
- Performs Chi-Square test. - Returns:
- testResultTestResult
- Test result class. 
 
- testResult
 - Notes - The Chi-Square test is a goodness of fit test which objective is to check the normality assumption (null hypothesis) of residuals (and thus the model). - Usually, Chi-Square test applies for discrete distributions. Here we rely on the - ChiSquared()to check the normality.
 - getNormalityTestResultKolmogorovSmirnov()¶
- Performs Kolmogorov test. - Returns:
- testResultTestResult
- Test result class. 
 
- testResult
 - Notes - We check if the residuals are Gaussian thanks to - Kolmogorov().
 - getResidualsStandardError()¶
- Accessor to the standard error of the residuals. - Returns:
- stdErrorfloat
- The residuals standard deviation estimate. 
 
 - Notes - The standard error is also called the root mean squared error or the standard error of regression. It is the residual standard deviation - defined in (8) which is the unbiaised residuals variance. 
 - hasName()¶
- Test if the object is named. - Returns:
- hasNamebool
- True if the name is not empty. 
 
 
 - setName(name)¶
- Accessor to the object’s name. - Parameters:
- namestr
- The name of the object. 
 
 
 
 OpenTURNS
      OpenTURNS