LinearModelAlgorithm¶

class LinearModelAlgorithm(*args)¶

Class used to create a linear regression model.

Parameters:

XSample2-d sequence of float: The input samples of a model.
YSample2-d sequence of float: The output samples of a model, must be of dimension 1.
basisBasis: Optional. The $\phi$ basis .

Methods

`BuildDistribution`(inputSample)	Recover the distribution, with metamodel performance in mind.
`getBasis`()	Accessor to the input basis.
`getClassName`()	Accessor to the object's name.
`getDistribution`()	Accessor to the joint probability density function of the physical input vector.
`getInputSample`()	Accessor to the input sample.
`getName`()	Accessor to the object's name.
`getOutputSample`()	Accessor to the output sample.
`getResult`()	Accessor to the result of the algorithm.
`getWeights`()	Return the weights of the input sample.
`hasName`()	Test if the object is named.
`run`()	Compute the response surfaces.
`setDistribution`(distribution)	Accessor to the joint probability density function of the physical input vector.
`setName`(name)	Accessor to the object's name.

See also

LinearModelResult

Notes

This class fits a linear regression model between a scalar variable $Y$ and some $p$ scalar regressors $(X_1, \dots, X_p)$ . The model is estimated from $\sampleSize$ experiences that have provided an output sample of $Y$ related to an input sample of the regressors $\vect{X} = (X_1, \dots, X_p)$ .

Let $\vect{Y} = (Y_1, \dots, Y_\sampleSize)$ be the output sample and $(\vect{X}^1, \dots, \vect{X}^\sampleSize)$ the input sample, where $\vect{X}^i = (X_1^i, \dots, X_p^i)$ .

The linear model can be defined with or without a functional basis. If no basis is specified, the model is:

(1)¶ $Y = a_0 + \sum_{k=1}^{p} a_k X_k + \epsilon$

where $a_0, a_1, ..., a_{p} \in \Rset$ are scalar coefficients and $\epsilon$ a random variable with zero mean and constant variance $\sigma^2$ independent from the coefficients $a_k$ .

Let the $\hat{Y}_i$ for $1 \leq i \leq \sampleSize$ be the fitted values, defined by:

(2)¶ $\hat{Y}_i = \hat{a}_0 + \sum_{k=1}^{p} \hat{a}_k X_k^i$

where $\hat{\vect{a}} = (\hat{a}_0, \dots, \hat{a}_p)$ is the estimate of $\vect{a}$ .

If a functional basis is specified, let $p'$ be its dimension and $\phi_j : \Rset^{p} \rightarrow \Rset$ for $j \in \{1, ..., p'\}$ be the $j$ -th basis function. The linear model is:

(3)¶ $Y = \sum_{j=1}^{p'} a_j \phi_j(\vect{X}) + \epsilon$

where $\vect{a} = (a_1, \dots, a_{p'})$ and $\epsilon$ have the same properties as in the previous case.

The fitted values $\hat{Y}_i$ for $1 \leq i \leq \sampleSize$ are defined by:

(4)¶ $\hat{Y}_i = \sum_{j=1}^{p'} \hat{a}_j \phi_j(\vect{X}^i)$

where $\hat{\vect{a}} = (\hat{a}_1, \dots, \hat{a}_{p'})$ is the estimate of the $\vect{a}$ .

We define the residual of the $i$ -th experience by:

(5)¶ $\varepsilon_i & = Y_i - \hat{Y}_i$

The algorithm still estimates the coefficients $\vect{a}$ as well as the variance $\sigma^2$ . The coefficients are evaluated using a linear least squares method, by default the QR method. User might also choose SVD or Cholesky by setting the LinearModelAlgorithm-DecompositionMethod key of the ResourceMap. Here are a few guidelines to choose the appropriate decomposition method:

The Cholesky can be safely used if the functional basis is orthogonal and the sample is drawn from the corresponding distribution, because this ensures that the columns of the design matrix are asymptotically orthogonal when the sample size increases. In this case, evaluating the Gram matrix does not increase the condition number.
Selecting the decomposition method can also be based on the sample size.

Please read the Build() help page for details on this topic.

The LinearModelAnalysis class can be used for a detailed analysis of the linear model result.

No scaling is involved in this method. The scaling of the data, if any, is the responsibility of the user of the algorithm. This may be useful if, for example, we use a linear model (without functional basis) with very different input magnitudes and use the Cholesky decomposition applied to the associated Gram matrix. In this case, the Cholesky method may fail to produce accurate results.

Examples

>>> import openturns as ot
>>> func = ot.SymbolicFunction(
...     ['x1', 'x2', 'x3'],
...     ['x1 + x2 + sin(x2 * 2 * pi_)/5 + 1e-3 * x3^2']
... )
>>> dimension = 3
>>> distribution = ot.JointDistribution([ot.Normal()] * dimension)
>>> inputSample = distribution.getSample(20)
>>> outputSample = func(inputSample)
>>> algo = ot.LinearModelAlgorithm(inputSample, outputSample)
>>> algo.run()
>>> result = algo.getResult()
>>> design = result.getDesign()
>>> gram = design.computeGram()
>>> leverages = result.getLeverages()

In order to access the projection matrix, we build the least squares method.

>>> lsMethod = result.buildMethod()
>>> projectionMatrix = lsMethod.getH()

__init__(*args)¶

static BuildDistribution(inputSample)¶

Recover the distribution, with metamodel performance in mind.

For each marginal, find the best 1-d continuous parametric model else fallback to the use of a nonparametric one.

The selection is done as follow:

We start with a list of all parametric models (all factories)

For each model, we estimate its parameters if feasible.

We check then if model is valid, ie if its Kolmogorov score exceeds a threshold fixed in the MetaModelAlgorithm-PValueThreshold ResourceMap key. Default value is 5%

We sort all valid models and return the one with the optimal criterion.

For the last step, the criterion might be BIC, AIC or AICC. The specification of the criterion is done through the MetaModelAlgorithm-ModelSelectionCriterion ResourceMap key. Default value is fixed to BIC. Note that if there is no valid candidate, we estimate a non-parametric model (KernelSmoothing or Histogram). The MetaModelAlgorithm-NonParametricModel ResourceMap key allows selecting the preferred one. Default value is Histogram

One each marginal is estimated, we use the Spearman independence test on each component pair to decide whether an independent copula. In case of non independence, we rely on a NormalCopula.

Parameters:

sampleSample: Input sample.

Returns:

distributionDistribution: Input distribution.

getBasis()¶

Accessor to the input basis.

Returns:

basisBasis: The basis of the regression model.

Notes

If a functional basis has been provided in the constructor, then we get it back: $(\phi_k)_{1 \leq k \leq p'}$ . Otherwise, the functional basis is composed of the projections $\phi_k : \Rset^p \rightarrow \Rset$ such that $\phi_k(\vect{x}) = x_k$ for $1 \leq k \leq p$ , completed with the constant function: $\phi_0 : \vect{x} \rightarrow 1$ .

getClassName()¶

Accessor to the object’s name.

Returns:

class_namestr: The object class name (object.__class__.__name__).

getDistribution()¶

Accessor to the joint probability density function of the physical input vector.

Returns:

distributionDistribution: Joint probability density function of the physical input vector.

getInputSample()¶

Accessor to the input sample.

Returns:

inputSampleSample: Input sample of a model evaluated apart.

getName()¶

Accessor to the object’s name.

Returns:

namestr: The name of the object.

getOutputSample()¶

Accessor to the output sample.

Returns:

outputSampleSample: Output sample of a model evaluated apart.

getResult()¶

Accessor to the result of the algorithm.

Returns:

resultLinearModelResult: All the results of the algorithm.

getWeights()¶

Return the weights of the input sample.

Returns:

weightssequence of float: The weights of the points in the input sample.

hasName()¶

Test if the object is named.

Returns:

hasNamebool: True if the name is not empty.

run()¶

Compute the response surfaces.

Notes

It computes the response surfaces and creates a MetaModelResult structure containing all the results.

setDistribution(distribution)¶

Accessor to the joint probability density function of the physical input vector.

Parameters:

distributionDistribution: Joint probability density function of the physical input vector.

setName(name)¶

Accessor to the object’s name.

Parameters:

namestr: The name of the object.

Examples using the class¶

Build and validate a linear model

Distribution of estimators in linear regression

Create a linear model

Perform stepwise regression

OpenTURNS

An Open source initiative for the Treatment of Uncertainties, Risks'N Statistics

Table of Contents

Previous topic

Next topic

This Page

LinearModelAlgorithm¶

Examples using the class¶