Functional Chaos Expansion

Accounting for the joint probability density function (PDF) \mu_{\inputRV}(\vect{x}) of the input random vector \inputRV, one seeks the joint PDF of output random vector \outputRV = \model(\inputRV). This may be achieved using Monte Carlo (MC) simulation (see Monte Carlo simulation). However, the MC method may require a large number of model evaluations, i.e. a great computational cost, in order to obtain accurate results.

A possible solution to overcome this problem is to project the model \model in a suitable functional space, such as the Hilbert space L^2\left(\mu_{\inputRV}\right) of square-integrable functions with respect to \mu_{\inputRV}. This projection is called a functional chaos expansion of the model \model.

The principles of the building of a functional chaos expansion are described in the sequel.

Methodology: principles

We consider the output random vector:

\outputRV = \model(\inputRV)

where \model: \Rset^{\inputDim} \rightarrow \Rset^{\outputDim} is the model, \inputRV is the input random vector which distribution is \mu_{\inputRV}.

We assume that \outputRV has finite variance i.e. \model \in L^2\left(\mu_{\inputRV}\right).

When \outputDim > 1, the functional chaos algorithm is used on each marginal of \outputRV, using the same multivariate orthonormal basis for all the marginals. Thus, the method is detailed here for a scalar output Y and \model: \Rset^{\inputDim} \rightarrow \Rset.

The functional space L^2\left(\mu_{\inputRV}\right) is a Hilbert space equipped with an inner product defined by:

\scalarproduct{h_1}{h_2}_{L^2\left(\mu_{\inputRV}\right)}
= \Expect{h_1(\inputRV) h_2(\inputRV)}

for any (h_1,h_2) \in L^2\left(\mu_{\inputRV}\right). For a continuous random variable, the inner product is defined by:

\scalarproduct{h_1}{h_2}_{L^2\left(\mu_{\inputRV}\right)}
=  \int h_1(\vect{x}) h_2(\vect{x})\, \mu_{\inputRV}(\vect{x}) d\vect{x}.

For a discrete random variable, the scalar product is:

\scalarproduct{h_1}{h_2}_{L^2\left(\mu_{\inputRV}\right)}
= \sum_\vect{x} h_1(\vect{x}) h_2(\vect{x})\, \Prob{\inputRV = \vect{x}}.

The associated norm is defined by:

\|h\|^2_{L^2\left(\mu_{\inputRV}\right)}
= \Expect{\left(h(\inputRV)\right)^2}

for any h \in L^2\left(\mu_{\inputRV}\right).

The Hilbert space L^2\left(\mu_{\inputRV}\right) admits an orthonormal basis denoted by (\psi_k)_{k \geq 0} with \psi_k : \Rset^{\inputDim} \rightarrow \Rset. Therefore, any h \in L^2\left(\mu_{\inputRV}\right) can be written as a functional chaos expansion (see [lemaitre2010] page 39):

h = \sum_{k \geq 0} a_k \psi_k

where \left(a_k \in \Rset\right)_{k\geq 0} is a sequence of coefficients. The meta model is the truncation of the expansion to a finite subset I of coefficients:

\metaModel =  \sum_{k \in I} a_k \psi_k

Methodology: step by step

The construction of a meta model as a truncated functional expansion consists in choosing:

  • Choice 1: an approximation space \cP such that:

(1)\overline{\cP} = L^2\left(\mu_{\inputRV}\right).

  • Choice 2: a sequence of nested approximation spaces (\cP_n)_{n \in \Nset} such that:

(2)\cP = \bigcup_{n \in \Nset} \cP_n

which implies that any function in L^2\left(\mu_{\inputRV}\right) is the limit with respect to L^2\left(\mu_{\inputRV}\right) of a sequence (h_n)_{n \in \Nset} such that h_n \in \cP_n:

(3)\lim_{n \rightarrow \infty} \left\|\model - h_n\right\|_{L^2\left(\mu_{\inputRV}\right)} = 0

  • Choice 3: a basis \left(\psi_k\right)_{k \in I_n} of \cP_n:

(4)\cP_n = \mathrm{span}(\psi_k)_{k \in I_n}

where I_n is a finite index set.

  • Choice 4: a specific space \cP_n in which \model is approximated by \metaModel \in \cP_n as follows:

(5)\metaModel =  \sum_{k \in I_n} a_k \psi_k

which amounts to solving:

(6)\metaModel = \arg\min_{h_n \in \cP_n} | \model - h_n |^2_{L^2\left(\mu_{\inputRV}\right)}

This problem is a linear least-squares problem.

Which approximation space and sequence of nested approximation spaces?

Many choices are possible for \cP and the associated sequence (\cP_n)_{n \in \Nset}.

For instance, one may choose \cP to be the set of multivariate polynomials, provided that \cP verifies the condition (1) (see also [sullivan2015], page 139, [dahlquist2008], theorem 4.5.16 page 456 and [rudin1987], section 4.24 page 85).

We then consider a sequence of nested spaces (\cP_n)_{n \in \Nset} defined as follows: we construct a complete family using graded polynomials by introducing a bijection \tau from \Nset into \Nset^\inputDim. The mapping \tau(k) specifies the multi-index of marginal degrees (this bijection ensures that all polynomials are covered and that any finite family is linearly independent). Then, \cP_n is the space spanned by polynomials with marginal degrees \tau(0), \dots, \tau(n). Depending on the choice of \tau, \cP_n may correspond to the set of polynomials of total degree less than n. See Tensorized multivariate basis enumeration functions and Multivariate indices enumeration functions for more details on this topic.

Note that, depending on the distribution \mu_{\inputRV}, the space \cP of multivariate polynomials does not always satisfy \cP \subset L^2\left(\mu_{\inputRV}\right), and even when this inclusion holds, the density condition (1) must also be satisfied:

  • First, for all polynomial spaces \cP_n to be subspaces of L^2\left(\mu_{\inputRV}\right), it is required that \langle X^n, 1 \rangle_{L^2\left(\mu_{\inputRV}\right)} = \Expect{X^n} for n \geq 0 be finite in the scalar case, and \left\langle \prod_{i \in A} X_i^{n_i}, 1 \right\rangle_{L^2\left(\mu_{\inputRV}\right)} =
\Expect{\prod_{i \in A} X_i^{n_i}} with A \subset \llbracket 1, d \rrbracket in dimension \inputDim. This implies that the measure \mu_{\inputRV} must have finite moments of all orders. This is the case, for instance, when the support of \mu_{\inputRV} is bounded. However,this is not the case, for instance, when \mu_{\inputRV} is the Cauchy distribution: for this distribution, the only polynomial space contained in L^2\left(\mu_{\inputRV}\right) is the space of constant polynomials, since the Cauchy distribution has an infinite expectation.

  • Moreover, among probability measures \mu_{\inputRV} that admit moments of all orders, the closure property (1) is not always guaranteed. For instance, the log-normal distribution (see [ernst2012]). In this case, the expansion may not converge to the function. Nevertheless, even without any guarantee, it is possible that the meta model built using the basis \left(\psi_k\right)_{k \in I_n\}} may be a good approximation of \model.

One may also consider non-polynomial approximation spaces \cP. For instance, the space spanned by trigonometric polynomials (the so-called Fourier space) or spanned by the Haar wavelets.

If the approximation space \cP does not satisfy condition (1), i.e., if \overline{\cP} \nsubseteq L^2\left(\mu_{\inputRV}\right), then one may use an isoprobabilistic transformation T: \Rset^{\inputDim} \rightarrow \Rset^{\inputDim} such that \vect{U} = T(\inputRV) is a random vector distributed according to a density \mu_{\vect{U}} satisfying \overline{\cP} = L^2\left(\mu_{\vect{U}}\right).

The model \model is then transformed into h = \model \circ T^{-1} \in L^2\left(\mu_{\vect{U}}\right). One seeks \widetilde{h}, the orthogonal projection of h with respect to \mu_{\vect{U}} onto the polynomial space \cP_n. The metamodel of \model is then given by:

\metaModel = \widetilde{h} \circ T

Projecting h onto the basis (\varphi_k)_{k \in I_n} of \cP_n orthogonally with respect to \mu_{\vect{U}} is equivalent to projecting \model onto the space spanned by (\varphi_k \circ T)_{k \in I_n} orthogonally with respect to \mu_{\inputRV}.

Which basis of the approximation space?

The sub-spaces (\cP_n)_{n \in \Nset} are nested which means that the basis of \cP_{n+1} is constructed from that of \cP_{n} by enrichment. Several choices of basis are possible.

We can choose a basis which is orthonormal with respect to \boldsymbol{\mu_{\inputRV}} (see Multivariate Orthonormal basis for more details):

  • If \mu_{\inputRV} has independent components, this multivariate orthonormal basis can be built as the tensor product of the univariate basis orthonormal to each component of \mu_{\inputRV}.

  • If \mu_{\inputRV} has dependent components, we can use an isoprobabilistic transformation T: \Rset^{\inputDim} \rightarrow \Rset^{\inputDim} that maps the measure \mu_{\inputRV} into a measure \mu_{\vect{U}} with independent marginals. We can also use the Soize-Ghanem basis but this usage is not recommended.

We can also choose a basis which is not orthonormal with respect to \boldsymbol{\mu_{\inputRV}} but which is orthonormal with respect to an instrumental measure \tilde{p} \neq \mu_{\inputRV}. This instrumental measure is such that:

  • the support of \tilde{p} is larger than the support of \mu_{\inputRV},

  • \tilde{p} has independent components.

This choice is done in the domination strategy.

Once the basis has been chosen, the enumeration function helps to enumerate the elements of the basis in a specific order (see Tensorized multivariate basis enumeration functions and Multivariate indices enumeration functions for more details on this topic).

Which sub-space of approximation?

If the dimension of \cP_n (that is to say if the number of coefficients to be computed) is too large, this can lead to overfitting. This may happen for instance if the total polynomial order we choose is too large.

In order to limit this effect, one method is to define a strategy for exploring the basis (see Sparse least squares meta model) as well as a strategy to select the coefficients which best predict the output (see FixedStrategy and CleaningStrategy).

Estimate the coefficients

In this section, we give some elements to estimate the coefficients of the expansion.

The vector of coefficients is the solution of the linear least-squares problem defined in Functional Chaos Expansion by equation (6). Refer to Least squares meta models for more details on the resolution of the least-squares problem.

The choice of basis of \cP_n has a major impact on the conditioning of the least-squares problem. Indeed, if the basis \left(\psi_k\right)_{k \in I_n} is orthonormal with respect to \mu_{\inputRV}, then the design matrix of the least squares problem is well-conditioned. In the other case, the design matrix might be ill-conditioned, leading to numerical instabilities.

If the chosen basis \left(\psi_k\right)_{k \in I_n} is orthonormal with respect to \mu_{\inputRV}, then the coefficients (a_k)_{k \in I_n} can be computed using the inner product (see [dahlquist2008] theorem 4.5.13 page 454) as follows:

(7)a_k = \scalarproduct{h}{\psi_k}_{L^2\left(\mu_{\inputRV}\right)}

for k\in I_n. This integral can computed using a quadrature rule (see the documentation of the IntegrationStrategy class). This is the case for instance when an Isoprobabilistic transformations T is used and the model h = \model \circ T^{-1} is expanded onto the basis \left(\psi_k\right)_{k \in I_n} which is orthonormal with respect to \mu_{\vect{U}}. Thus, we get:

(8)a_k = \scalarproduct{h}{\psi_k}_{L^2\left(\mu_{\vect{U}}\right)} = \scalarproduct{\model}
{\psi_k \circ T}_{L^2\left(\mu_{\inputRV}\right)}

Generally speaking, choosing a basis that is orthonormal with respect to \mu_{\inputRV} simplifies the computation of the coefficients (a_k)_{k \in I_n} by turning the least-squares problem into the computation of inner products: in this case, solving the least-squares problem amounts to evaluating inner products, resulting in a significantly lower computational cost.

However, the optimal estimator of the coefficient vector is the one obtained by solving the least-squares problem. Therefore, for estimating the coefficients, it is preferable to solve the least-squares problem even when the basis of the approximation space is orthonormal with respect to the distribution \mu_{\inputRV}.

In a probabilistic setting of independent and identically distributed samples, the two estimates (from least-squares problem and from the inner product) are consistent but have different asymptotic variance, resulting in two qualities of approximation (see [lemaitre2010] eq. 3.48 and eq. 3.49 page 66).

Several algorithms are available to compute the coefficients (a_k)_{k \in I_n}:

Cross Validation of the functional chaos expansion

The cross-validation of a polynomial chaos expansion uses the theory presented in Validation and cross validation of metamodels. In [blatman2009] page 84, the author applies the LOO equation to polynomial chaos expansion (see appendix D page 203 for a proof). If the coefficients are estimated from integration, the same derivation cannot, in theory, be applied.

The fast methods presented in Validation and cross validation of metamodels can be applied:

  • the fast leave-one-out cross-validation,

  • the fast K-Fold cross-validation.

Refer to FunctionalChaosValidation.

Usual exploitation of the functional chaos expansion

There are many ways to use the functional chaos expansion. We present two usual exploitations:

  • using the expansion as a random vector generator,

  • performing the sensitivity analysis of the expansion.

The first usage is to create a random vector \widetilde{Y} defined by:

\widetilde{Y} = \metaModel(\inputRV).

This equation can be used to simulate independent random observations from the functional chaos expansion: see the FunctionalChaosRandomVector class for more details on this topic.

The second usage assumes that the input distribution \mu_{\inputRV} has independent marginals and that the basis \left(\psi_k\right)_{k \in I_n} is orthonormal with respect to \mu_{\inputRV} and that the first element be:

(9)\psi_0 = 1

The orthogonality of the functions implies that:

\mathbf{E}_{\mu_{\inputRV}}\left[\psi_{i}(\vect{X})\right] = \scalarproduct{\psi_{i}}{\psi_{0}}_{L^2\left(\mu_{\inputRV}\right)} = 0

for any non-zero i\neq 0.

In that case, the Sobol’ indices can easily be deduced from the coefficients (a_k)_{k \in I_n, k\neq 0}: see FunctionalChaosSobolIndices for more details on this topic.