# Sparse least squares polynomial metamodel¶

*sparse least squares*schemes may be employed (they are also known as

*variable selection*techniques in statistics). These methods are aimed at identifying a small set of significant coefficients in the model response approximation. Eventually, a

*sparse*polynomial response surface, i.e. a polynomial which only contains a low number of non-zero coefficients, is obtained by means of a reduced number of possibly costly model evaluations. In the sequel, we focus on sparse regression schemes based on -penalization.

*collection*of less and less sparse approximations. Eventually an optimal approximation may be retained with regard to a given criterion.

**Context**

Consider the mathematical model of a physical system depending on input parameters . Note that these input variables are assumed to be deterministic in this section. The model response may be approximated by a polynomial as follows:

(1)¶

**LASSO**

*LASSO*(for

*Least absolute shrinkage and selection operator*) method is based on a -penalized regression as follows:

(2)¶

where and is a non negative constant. A nice feature of LASSO is that it provides a

sparsemetamodel, i.e. it discards insignificant variables from the set of predictors. The obtained response surface is all the sparser since the value of the tuning parameter is high.

**Forward stagewise regression**

Another procedure, known as *forward stagewise regression*, appears to
be different from LASSO, but turns out to provide similar results. The
procedure first picks the predictor that is most correlated with the
vector of observations. However, the value of the corresponding
coefficient is only increased by a small amount. Then the predictor with
largest correlation with the current residual (possible the same term as
in the previous step) is picked, and so on. Let us introduce the vector
notation:

The forward stagewise algorithm is outlined below:

Start with and .

Find the predictor that is most correlated with .

Update , where .

Update and repeats Steps 2 and 3 until no predictor has any correlation with .

**Least Angle Regression**

*Least Angle Regression* (LAR) may be viewed as a version of forward
stagewise that uses mathematical derivations to speed up the
computations. Indeed, instead of taking many small steps with the basis
term most correlated with current residual , the
related coefficient is directly increased up to the point where some
other basis predictor has as much correlation with
. Then the new predictor is entered, and the
procedure is continued. The LARS algorithm is detailed below:

Initialize the coefficients to . Set the initial residual equal to the vector of observations .

Find the vector which is most correlated with the current residual.

Move from 0 toward the least-square coefficient of the current residual on , until some other predictor has as much correlation with the current residual as does .

Move jointly in the direction defined by their joint least-square coefficient of the current residual on , until some other predictor has as much correlation with the current residual.

Continue this way until predictors have been entered.

*active*coefficients toward their least-square value. It corresponds to an updating of the form . Vector and coefficient are referred to as the LARS

*descent direction*and

*step*, respectively. Both quantities may be derived algebraically.

*collection*of metamodels.

**LASSO and Forward Stagewise as variants of LARS**

It has been shown that with only one modification, the LARS procedure provides in one shot the entire paths of LASSO solution coefficients as the tuning parameter in (2) is increased from 0 up to a maximum value. The modified algorithm is as follows:

Run the LARS procedure from Steps 1 to 4.

If a non zero coefficient hits zero, discard it from the current metamodel and recompute the current joint least-square direction.

Continue this way until predictors have been entered.

*S*referring to

*Stagewise*and

*LASSO*).

**Selection of the optimal LARS solution**

The LAR-based approaches (i.e. original LAR, LASSO and forward
stagewise) provide a *collection* of less and less sparse PC
approximations. The accuracy of each PC metamodel may be assessed by
cross validation. Eventually the PC representation associated with the
lowest error estimate is retained.