Sparse least squares polynomial metamodel¶
Context
Consider the mathematical model of a physical system depending
on
input parameters
. Note that
these input variables are assumed to be deterministic in this section.
The model response may be approximated by a polynomial as follows:
(1)¶
(2)¶
where
and
is a non negative constant. A nice feature of LASSO is that it provides a sparse metamodel, i.e. it discards insignificant variables from the set of predictors. The obtained response surface is all the sparser since the value of the tuning parameter
is high.
Another procedure, known as forward stagewise regression, appears to be different from LASSO, but turns out to provide similar results. The procedure first picks the predictor that is most correlated with the vector of observations. However, the value of the corresponding coefficient is only increased by a small amount. Then the predictor with largest correlation with the current residual (possible the same term as in the previous step) is picked, and so on. Let us introduce the vector notation:
The forward stagewise algorithm is outlined below:
Start with
and
.
Find the predictor
that is most correlated with
.
Update
, where
.
Update
and repeats Steps 2 and 3 until no predictor has any correlation with
.
Least Angle Regression (LAR) may be viewed as a version of forward
stagewise that uses mathematical derivations to speed up the
computations. Indeed, instead of taking many small steps with the basis
term most correlated with current residual , the
related coefficient is directly increased up to the point where some
other basis predictor has as much correlation with
. Then the new predictor is entered, and the
procedure is continued. The LARS algorithm is detailed below:
Initialize the coefficients to
. Set the initial residual equal to the vector of observations
.
Find the vector
which is most correlated with the current residual.
Move
from 0 toward the least-square coefficient of the current residual on
, until some other predictor
has as much correlation with the current residual as does
.
Move jointly
in the direction defined by their joint least-square coefficient of the current residual on
, until some other predictor
has as much correlation with the current residual.
Continue this way until
predictors have been entered.
It has been shown that with only one modification, the LARS procedure
provides in one shot the entire paths of LASSO solution coefficients as
the tuning parameter in (2) is increased from 0 up
to a maximum value. The modified algorithm is as follows:
Run the LARS procedure from Steps 1 to 4.
If a non zero coefficient hits zero, discard it from the current metamodel and recompute the current joint least-square direction.
Continue this way until
predictors have been entered.
The LAR-based approaches (i.e. original LAR, LASSO and forward stagewise) provide a collection of less and less sparse PC approximations. The accuracy of each PC metamodel may be assessed by cross validation. Eventually the PC representation associated with the lowest error estimate is retained.