Sparse least squares polynomial metamodel¶
Context
Consider the mathematical model of a physical system depending on input parameters . Note that these input variables are assumed to be deterministic in this section. The model response may be approximated by a polynomial as follows:
(1)¶
(2)¶
where and is a non negative constant. A nice feature of LASSO is that it provides a sparse metamodel, i.e. it discards insignificant variables from the set of predictors. The obtained response surface is all the sparser since the value of the tuning parameter is high.
Another procedure, known as forward stagewise regression, appears to be different from LASSO, but turns out to provide similar results. The procedure first picks the predictor that is most correlated with the vector of observations. However, the value of the corresponding coefficient is only increased by a small amount. Then the predictor with largest correlation with the current residual (possible the same term as in the previous step) is picked, and so on. Let us introduce the vector notation:
The forward stagewise algorithm is outlined below:
Start with and .
Find the predictor that is most correlated with .
Update , where .
Update and repeats Steps 2 and 3 until no predictor has any correlation with .
Least Angle Regression (LAR) may be viewed as a version of forward stagewise that uses mathematical derivations to speed up the computations. Indeed, instead of taking many small steps with the basis term most correlated with current residual , the related coefficient is directly increased up to the point where some other basis predictor has as much correlation with . Then the new predictor is entered, and the procedure is continued. The LARS algorithm is detailed below:
Initialize the coefficients to . Set the initial residual equal to the vector of observations .
Find the vector which is most correlated with the current residual.
Move from 0 toward the least-square coefficient of the current residual on , until some other predictor has as much correlation with the current residual as does .
Move jointly in the direction defined by their joint least-square coefficient of the current residual on , until some other predictor has as much correlation with the current residual.
Continue this way until predictors have been entered.
It has been shown that with only one modification, the LARS procedure provides in one shot the entire paths of LASSO solution coefficients as the tuning parameter in (2) is increased from 0 up to a maximum value. The modified algorithm is as follows:
Run the LARS procedure from Steps 1 to 4.
If a non zero coefficient hits zero, discard it from the current metamodel and recompute the current joint least-square direction.
Continue this way until predictors have been entered.
The LAR-based approaches (i.e. original LAR, LASSO and forward stagewise) provide a collection of less and less sparse PC approximations. The accuracy of each PC metamodel may be assessed by cross validation. Eventually the PC representation associated with the lowest error estimate is retained.
API:
See
LARS
Examples:
References:
Efron, T. Hastie, I. Johnstone, and R. Tibshirani, 2004, “Least angle regression”, Annals of Statistics 32, 407–499.
Hastie, J. Taylor, R. Tibshirani, and G. Walther, 2007, “Forward stagewise regression and the monotone Lasso”, Electronic J. Stat. 1, 1–29.