Kriging (also known as Gaussian process regression) is a Bayesian technique that aim at approximating functions (most often in order to surrogate it because it is expensive to evaluate). In the following it is assumed we aim at creating a surrogate model of a scalar-valued model \cM: \vect{x} \mapsto y. Note the implementation of Kriging deals with vector-valued functions (\cM: \vect{x} \mapsto \vect{y}), without simply looping over each output. It is also assumed the model is obtained over a design of experiments in order to produce a set of observations gathered in the following dataset: \left(\left(\vect{x}^{(i)}, y^{(i)}\right), i = 1, \ldots, m\right). Ultimately Kriging aims at producing a predictor (also known as a response surface or metamodel) denoted as \tilde{\cM}.

We put the following Gaussian process prior on the model \cM:

Y(\vect{x}) = \Tr{\vect{f}(\vect{x})} \vect{\beta} + Z(\vect{x})


  • \Tr{\vect{f}(\vect{x})} \vect{\beta} is a general linear model based upon a functional basis \vect{f} = \left(f_j, j = 1, \ldots, p\right) and a vector of coefficients \vect{\beta} = \left(\beta_j, j = 1, \ldots, p\right),

  • Z is a zero-mean stationary Gaussian process whose covariance function reads:

    \mathbb{E}[Z(\vect{x})\,Z(\vect{x'})] = \sigma^2 R(\vect{x} - \vect{x'}, \vect{\theta})

    where \sigma^2 > 0 is the variance and R is the correlation function that solely depends on the Manhattan distance between input points \vect{x} - \vect{x'} and a vector of parameters \vect{\theta} \in \Rset^{n_\theta}.

Under the Gaussian process prior assumption, the observations \vect{Y} = \left(Y_i, i = 1, \ldots, m\right) and a prediction Y(\vect{x}) at some unobserved input \vect{x} are jointly normally distributed:

      \vect{Y} \\
    \cN_{m + 1}\left(
        \mat{F} \vect{\beta} \\
        \Tr{\vect{f}(\vect{x})} \vect{\beta}
        \mat{R} & \vect{r}(\vect{x}) \\
        \vect{r}(\vect{x})^t & 1


\mat{F} = \left[f_j\left(\vect{x}^{(i)}\right), i = 1, \ldots, m, j = 1, \ldots, p\right]

is the regression matrix,

\mat{R} = \left[R\left(\vect{x}^{(i)} - \vect{x}^{(j)}, \vect{\theta}\right), i,\,j = 1, \ldots, m\right]

is the observations’ correlation matrix, and:

\vect{r}(\vect{x}) = \Tr{\left(R\left(\vect{x} - \vect{x}^{(i)}, \vect{\theta}\right), i = 1, \ldots, m\right)}

is the vector of cross-correlations between the prediction and the observations.

As such, the Kriging predictor is defined as the following conditional distribution:

\tilde{Y}(\vect{x}) = \left[Y(\vect{x}) \mid \vect{Y} = \vect{y}, \vect{\theta}=\vect{\theta}^*, \sigma^2=(\sigma^2)^*\right]

where \vect{\theta}^* and (\sigma^2)^* are the maximum likelihood estimates of the correlation parameters \vect{\theta} and variance \sigma^2 (see references).

It can be shown (see references) that the predictor \tilde{Y}(\vect{x}) is also Gaussian:

\tilde{Y}(\vect{x}) = \cN_1\left(\mu_{\tilde{Y}}(\vect{x}), \sigma^2_{\tilde{Y}}(\vect{x})\right)

  • with mean:

    \mu_{\tilde{Y}}(\vect{x}) = \Tr{\vect{f}(\vect{x})} \tilde{\vect{\beta}} + \Tr{\vect{r}(\vect{x})} \vect{\gamma}

    where \underline{\tilde{\beta}} is the generalized least squares solution of the underlying linear regression problem:

    \tilde{\vect{\beta}} = \left(\Tr{\mat{F}} \mat{R}^{-1} \mat{F}\right)^{-1} \Tr{\mat{F}} \mat{R}^{-1} \vect{y}


    \vect{\gamma} = \mat{R}^{-1} \left(\vect{y} - \mat{F} \tilde{\vect{\beta}}\right)

  • and variance:

    \sigma^2_{\tilde{Y}}(\vect{x}) =
        \sigma^2 \left[1 -
            \Tr{\vect{r}(\vect{x})} \mat{R}^{-1} \vect{r}(\vect{x})
            + \Tr{\vect{u}(\vect{x})} \left(\Tr{\mat{F}} \mat{R}^{-1} \mat{F}\right)^{-1} \vect{u}(\vect{x})


    \vect{u}(\vect{x}) = \Tr{\mat{F}} \mat{R}^{-1} \vect{r}(\vect{x}) - \vect{f}(\vect{x})

Kriging may also be referred to as Gaussian process regression.