Box Cox transformation

We consider X: \Omega \times \cD \rightarrow \Rset^d a multivariate stochastic process of dimension d where \cD \in \Rset^n and \omega \in \Omega is an event. We suppose that the process is \cL^2(\Omega).
We note X_{\vect{t}}: \Omega \rightarrow \Rset^d the random variable at the vertex \vect{t} \in \cD defined by X_{\vect{t}}(\omega)=X(\omega, \vect{t}).
If the variance of X_{\vect{t}} depends on the vertex \vect{t}, the Box Cox transformation maps the process X into the process Y such that the variance of Y_{\vect{t}} is constant (at the first order at least) with respect to \vect{t}.
We present here:
  • the estimation of the Box Cox transformation from a given field of the process X,

  • the action of the Box Cox transformation on a field generated from X.

We note h: \Rset^d \rightarrow \Rset^d the Box Cox transformation which maps the process X into the process Y: \Omega \times \cD \rightarrow \Rset^d, where Y=h(X), such that \Var{Y_{\vect{t}}} is independent of \vect{t} at the first order.
We suppose that X_{\vect{t}} is a positive random variable for any \vect{t}. To verify that constraint, it may be needed to consider the shifted process X+\vect{\alpha}.
We illustrate some usual Box Cox transformations h in the scalar case (d=1), using the Taylor development of h: \Rset \rightarrow \Rset at the mean point of X_{\vect{t}}.
In the multivariate case, we estimate the Box Cox transformation component by component and we define the multivariate Box Cox transformation as the aggregation of the marginal Box Cox transformations.
Marginal Box Cox transformation:
The first order Taylor development of h around \Expect{Y_{\vect{t}}} writes:

\forall \vect{t} \in \cD, h(X_{\vect{t}}) = h(\Expect{X_{\vect{t}}}) + (X_{\vect{t}} - \Expect{X_{\vect{t}}})h'(\Expect{X_{\vect{t}}})

which leads to:

\Expect{h(X_{\vect{t}})} = h(\Expect{X_{\vect{t}}})

and then:

\Var{h(X_{\vect{t}})} = h'(\Expect{X_{\vect{t}}})^2  \Var{X_{\vect{t}}}

To have \Var{h(X_{\vect{t}})} constant with respect to \vect{t} at the first order, we need:

(1)h'(\Expect{X_{\vect{t}}}) = k \left(  \Var{X_{\vect{t}}} \right)^{-1/2}

Now, we make some additional hypotheses on the relation between \Expect{X_{\vect{t}}} and \Var{X_{\vect{t}}}:

  • If we suppose that \Var{X_{\vect{t}}} \propto \Expect{X_{\vect{t}}}, then (1) leads to the function h(y) \propto \sqrt{y} and we take h(y) = \sqrt{y}, y~>~0;

  • If we suppose that \Var{X_{\vect{t}}} \propto (\Expect{X_{\vect{t}}})^2 , then (1) leads to the function h(y) \propto \log{y} and we take h(y) = \log{y}, y>0;

  • More generally, if we suppose that \Var{X_{\vect{t}}} \propto (\Expect{X_{\vect{t}}})^{\beta}, then (1) leads to the function h_\lambda parametrized by the scalar \lambda:

    (2)h_\lambda(y) =
    \left\{
    \begin{array}{ll}
      \frac{y^\lambda-1}{\lambda} & \lambda \neq 0 \\
      \log(y)                     & \lambda = 0
    \end{array}
    \right.

where \lambda = 1-\frac{\beta}{2}.

The inverse Box Cox transformation is defined by:

(3)h^{-1}_\lambda(y) =
   \left\{
   \begin{array}{ll}
     \displaystyle (\lambda y + 1)^{\frac{1}{\lambda}} & \lambda \neq 0 \\
     \displaystyle \exp(y)                          & \lambda = 0
   \end{array}
   \right.

Estimation of the Box Cox transformation:
The parameter \lambda is estimated from a given field of the process X as follows.
The estimation of \lambda given below is optimized in the case when h_\lambda(X_{\vect{t}}) \sim \cN(\beta , \sigma^2 ) at each vertex \vect{t}. If it is not the case, that estimation can be considered as a proposition, with no guarantee.
The parameters (\beta,\sigma,\lambda) are then estimated by the maximum likelihood estimators. We note \Phi_{\beta, \sigma} and \phi_{\beta, \sigma} respectively the cumulative distribution function and the density probability function of the \cN(\beta , \sigma^2) distribution.
For all vertices \vect{t}, we have:

(4)\forall v \geq 0, \, \Prob{ X_{\vect{t}} \leq v } = \Prob{ h_\lambda(X_{\vect{t}}) \leq h_\lambda(v) } \\
  = \Phi_{\beta, \sigma} \left(h_\lambda(v)\right)

from which we derive the density probability function p of X_{\vect{t}} for all vertices \vect{t}:

(5)p(v) = h_\lambda'(v)\phi_{\beta, \sigma}(v) = v^{\lambda - 1}\phi_{\beta, \sigma}(v)

Using (5), the likelihood of the values (x_0, \dots, x_{N-1}) with respect to the model (4) writes:

(6)L(\beta,\sigma,\lambda) =
   \underbrace{ \frac{1}{(2\pi)^{N/2}}
     \times
     \frac{1}{(\sigma^2)^{N/2}}
     \times
     \exp\left[
       -\frac{1}{2\sigma^2}
       \sum_{k=0}^{N-1}
       \left(
       h_\lambda(x_k)-\beta
       \right)^2
       \right]
   }_{\Psi(\beta, \sigma)}
   \times
   \prod_{k=0}^{N-1} x_k^{\lambda - 1}

We notice that for each fixed \lambda, the likelihood equation is proportional to the likelihood equation which estimates (\beta, \sigma^2). Thus, the maximum likelihood estimator for (\beta(\lambda), \sigma^2(\lambda)) for a given \lambda are:

(7)\hat{\beta}(\lambda) = \frac{1}{N} \sum_{k=0}^{N-1} h_{\lambda}(x_k) \\
  \hat{\sigma}^2(\lambda) = \frac{1}{N} \sum_{k=0}^{N-1} (h_{\lambda}(x_k) - \beta(\lambda))^2

Substituting (7) into (6) and taking the \log-likelihood, we obtain:

(8)\ell(\lambda) = \log L( \hat{\beta}(\lambda), \hat{\sigma}(\lambda),\lambda ) = C -
\frac{N}{2}
\log\left[\hat{\sigma}^2(\lambda)\right]
\;+\;
\left(\lambda - 1 \right) \sum_{k=0}^{N-1} \log(x_i)\,,

where C is a constant.

The parameter \hat{\lambda} is the one maximizing \ell(\lambda) defined in (8).