Kriging (also known as Gaussian process regression) is a Bayesian technique that aim at approximating functions (most often in order to surrogate it because it is expensive to evaluate). In the following it is assumed we aim at creating a surrogate model of a scalar-valued model \cM: \vect{x} \mapsto y. Note the implementation of Kriging deals with vector-valued functions (\cM: \vect{x} \mapsto \vect{y}), without simply looping over each output. It is also assumed the model is obtained over a design of experiments in order to produce a set of observations gathered in the following dataset: \left(\left(\vect{x}^{(i)}, y^{(i)}\right), i = 1, \ldots, m\right). Ultimately Kriging aims at producing a predictor (also known as a response surface or metamodel) denoted as \tilde{\cM}.

We put the following Gaussian process prior on the model \cM:

Y(\vect{x}) = \Tr{\vect{f}(\vect{x})} \vect{\beta} + Z(\vect{x})


  • \Tr{\vect{f}(\vect{x})} \vect{\beta} is a generalized linear model based upon a functional basis \vect{f} = \left(f_j, j = 1, \ldots, p\right) and a vector of coefficients \vect{\beta} = \left(\beta_j, j = 1, \ldots, p\right),

  • Z is a zero-mean stationary Gaussian process whose covariance function reads:

    \mathbb{E}[Z(\vect{x})\,Z(\vect{x'})] = \sigma^2 R(\vect{x} - \vect{x'}, \vect{\theta})

    where \sigma^2 > 0 is the variance and R is the correlation function that solely depends on the Manhattan distance between input points \vect{x} - \vect{x'} and a vector of parameters \vect{\theta} \in \Rset^{n_\theta}.

Under the Gaussian process prior assumption, the observations \vect{Y} = \left(Y_i, i = 1, \ldots, m\right) and a prediction Y(\vect{x}) at some unobserved input \vect{x} are jointly normally distributed:

      \vect{Y} \\
    \cN_{m + 1}\left(
        \mat{F} \vect{\beta} \\
        \Tr{\vect{f}(\vect{x})} \vect{\beta}
        \mat{R} & \vect{r}(\vect{x}) \\
        \vect{r}(\vect{x})^t & 1


\mat{F} = \left[f_j\left(\vect{x}^{(i)}\right), i = 1, \ldots, m, j = 1, \ldots, p\right]

is the regression matrix,

\mat{R} = \left[R\left(\vect{x}^{(i)} - \vect{x}^{(j)}, \vect{\theta}\right), i,\,j = 1, \ldots, m\right]

is the observations’ correlation matrix, and:

\vect{r}(\vect{x}) = \Tr{\left(R\left(\vect{x} - \vect{x}^{(i)}, \vect{\theta}\right), i = 1, \ldots, m\right)}

is the vector of cross-correlations between the prediction and the observations.

As such, the Kriging predictor is defined as the following conditional distribution:

\tilde{Y}(\vect{x}) = \left[Y(\vect{x}) \mid \vect{Y} = \vect{y}, \vect{\theta}=\vect{\theta}^*, \sigma^2=(\sigma^2)^*\right]

where \vect{\theta}^* and (\sigma^2)^* are the maximum likelihood estimates of the correlation parameters \vect{\theta} and variance \sigma^2 (see references).

It can be shown (see references) that the predictor \tilde{Y}(\vect{x}) is also Gaussian:

\tilde{Y}(\vect{x}) = \cN_1\left(\mu_{\tilde{Y}}(\vect{x}), \sigma^2_{\tilde{Y}}(\vect{x})\right)

  • with mean:

    \mu_{\tilde{Y}}(\vect{x}) = \Tr{\vect{f}(\vect{x})} \tilde{\vect{\beta}} + \Tr{\vect{r}(\vect{x})} \vect{\gamma}

    where \underline{\tilde{\beta}} is the generalized least squares solution of the underlying linear regression problem:

    \tilde{\vect{\beta}} = \left(\Tr{\mat{F}} \mat{R}^{-1} \mat{F}\right)^{-1} \Tr{\mat{F}} \mat{R}^{-1} \vect{y}


    \vect{\gamma} = \mat{R}^{-1} \left(\vect{y} - \mat{F} \tilde{\vect{\beta}}\right)

  • and variance:

    \sigma^2_{\tilde{Y}}(\vect{x}) =
        \sigma^2 \left[1 -
            \Tr{\vect{r}(\vect{x})} \mat{R}^{-1} \vect{r}(\vect{x})
            + \Tr{\vect{u}(\vect{x})} \left(\Tr{\mat{F}} \mat{R}^{-1} \mat{F}\right)^{-1} \vect{u}(\vect{x})


    \vect{u}(\vect{x}) = \Tr{\mat{F}} \mat{R}^{-1} \vect{r}(\vect{x}) - \vect{f}(\vect{x})

Kriging may also be referred to as Gaussian process regression.



  • [dubourg2011]

    1. Lophaven, H. Nielsen and J. Sondergaard, 2002, “DACE, A Matlab kriging toolbox”, Technichal University of Denmark.

    1. Santner, B. Williams and W. Notz, 2003. “The design and analysis of computer experiments”, Springer, New York.

    1. Rasmussen and C. Williams, 2006, T. Dietterich (Ed.), “Gaussian processes for machine learning”, MIT Press.