Least squares polynomial response surface

Instead of replacing the model response \model(\vect{x}) with a local approximation around a given set \vect{x}_0 of input parameters as in Taylor expansion, one may seek a global approximation of \model(\vect{x}) over its whole domain of definition. A common choice to this end is global polynomial approximation. For the sake of simplicity, a scalar model response y=\model(\vect{x}) will be considered from now on. Nonetheless, the following derivations hold for a vector-valued response.

In the following, one considers global approximations of the model response using:

  • a linear function, i.e. a polynomial of degree one;

    y  \approx  \widehat{\model}(\vect{x})  =  a_0 \, + \,  \sum_{i=1}^{n_{X}} a_{i} x_i

    where (a_j  \, , \, j=0,\dots,n_X) is a set of unknown coefficients.

  • a quadratic function, i.e. a polynomial of degree two.

    y  \approx  \widehat{\model}(\vect{x})  =  a_0 \, + \,  \sum_{i=1}^{n_{X}} a_{i} x_i \, + \,
\sum_{i=1}^{n_{X}} \sum_{j=1}^{n_{X}} a_{i,j} x_i x_j

The two previous equations may be recast as:

y  \approx  \widehat{\model}(\vect{x})  =  \sum_{j=0}^{P-1} a_j \psi_j(\vect{x})

where P denotes the number of terms, which is equal to n_X + 1 (resp. to 1 + 2n_X + n_X (n_X - 1)/2) when using a linear (resp. a quadratic) approximation, and the family (\psi_j,j=0,\dots,P-1) gathers the constant monomial 1, the monomials of degree one x_i and possibly the cross-terms x_i x_j as well as the monomials of degree two x_i^2. Using the vector notation \vect{a}  =  (a_{0} , \dots , a_{P-1} )^{\textsf{T}} and \vect{\psi}(\vect{x})  =  (\psi_{0}(\vect{x}) , \dots , \psi_{P-1}(\vect{x}) )^{\textsf{T}}, this can be rewritten:

y  \approx  \widehat{\model}(\vect{x})  =  \Tr{\vect{a}}\vect{\psi}(\vect{x})

A global approximation of the model response over its whole definition domain is sought. To this end, the coefficients a_j may be computed using a least squares regression approach. In this context, an experimental design, that is, a set of observations of input parameters, is required:

\cX = \left\{ \vect{x}^{(1)}, \dots, \vect{x}^{(N)} \right\},

as well as the corresponding model evaluations:

\cY = \left\{ y^{(1)},\dots,y^{(N)} \right\}.

The least squares problem is to solve:

\widehat{\vect{a}} = \argmin_{\vect{a} \in \Rset^P} \cJ(\vect{a})

where \cJ(\vect{a}) is the cost function, defined as:

\cJ(\vect{a}) = \sum_{i=1}^N \left( y^{(i)} - \Tr{\vect{a}} \vect{\psi}\left(\vect{x}^{(i)}\right) \right)^2.

Let \vect{y} = \Tr{(y^{(1)},\dots,y^{(N)})} \in \Rset^{N} be the vector of output observations. If the design matrix \mat{\Psi} has full rank, then the solution is given by the normal equations:

(1)\widehat{\vect{a}}  =  \left( \Tr{\mat{\Psi}} \mat{\Psi}  \right)^{-1} \Tr{\mat{\Psi}}  \vect{y}

where:

\mat{\Psi}_{ij}  =  \psi_{j}\left(\vect{x}^{(i)}\right)

for i = 1, \dots, N and j = 0, \dots, P - 1. A necessary condition for having a solution is that the size N of the experimental design is not less than the number P of coefficients to estimate. The Gram matrix \Tr{\mat{\Psi}} \mat{\Psi} can be ill-conditionned. Hence, the best method is not necessarily to invert the Gram matrix, because the solution may be particularly sensitive to rounding errors. The least-squares problem is rather solved using more robust numerical methods such as the singular value decomposition (SVD) or the QR-decomposition.