ARMA process estimation

From the order (p,q) or a range of orders (p,q) \in Ind_p \times Ind_q, where Ind_p = [p_1, p_2] and Ind_q = [q_1, q_2], the methods aims to find the best model ARMA(p,q) that fits the data and estimate the corresponding coefficients. The best model is considered with respect to the AIC_c criteria (corrected Akaike Information Criterion), defined by:

AICc = -2\log L_w + 2(p + q + 1)\frac{m}{m - p - q - 2}

where m is half the number of points of the time grid of the process sample (if the data are a process sample) or in a block of the time series (if the data are a time series).

Two other criteria are computed for each order (p,q):

  • the AIC criterion:

    AIC = -2\log L_w + 2(p + q + 1)

  • and the BIC criterion:

    BIC = -2\log L_w + 2(p + q + 1)\log(p + q + 1)

The BIC criterion leads to a model that gives a better prediction; the AIC criterion selects the best model that fits the given data; the AIC_c criterion improves the previous one by penalizing a too high order that would artificially fit to the data.
For each order (p,q), the estimation of the coefficients (a_k)_{1\leq k\leq p}, (b_k)_{1\leq k\leq q} and the variance \sigma^2 is done using the Whittle estimator which is based on the maximization of the likelihood function in the frequency domain.
The principle is detailed hereafter for the case of a time series: in the case of a process sample, the estimator is similar except for the periodogram which is computed differently.
We consider a time series associated to the time grid (t_0, \hdots, t_{n-1}) and a particular order (p,q). Using the notation ([dim1]), the spectral density function of the ARMA(p,q) process writes:

(1)f(\lambda, \vect{\theta}, \sigma^2) = \frac{\sigma^2}{2 \pi} \frac{|1 + b_1 \exp(-i \lambda) + \ldots + b_q \exp(-i q \lambda)|^2}{|1 + a_1 \exp(-i \lambda) + \ldots + a_p \exp(-i p \lambda)|^2} = \frac{\sigma^2}{2 \pi} g(\lambda, \vect{\theta})

where \vect{\theta} = (a_1, a_2,...,a_p,b_1,...,b_q) and \lambda is the frequency value.

The Whittle log-likelihood writes:

(2)\log L_w(\vect{\theta}, \sigma^2) = - \sum_{j=1}^{m} \log f(\lambda_j,  \vect{\theta}, \sigma^2) - \frac{1}{2 \pi} \sum_{j=1}^{m} \frac{I(\lambda_j)}{f(\lambda_j,  \vect{\theta}, \sigma^2)}

where:

  • I is the non parametric estimate of the spectral density, expressed in the Fourier space (frequencies in [0,2\pi] instead of [-f_{max}, f_{max}]). By default the Welch estimator is used.

  • \lambda_j is the j-th Fourier frequency, \lambda_j = \frac{2 \pi j}{n}, j=1 \ldots m with m the largest integer \leq \frac{n-1}{2}.

We estimate the (p+q+1) scalar coefficients by maximizing the log-likelihood function. The corresponding equations lead to the following relation:

(3)\sigma^{2*} = \frac{1}{m} \sum_{j=1}^{m} \frac{I(\lambda_j)}{g(\lambda_j, \vect{\theta}^{*})}

where \vect{\theta}^{*} maximizes:

(4)\log L_w(\vect{\theta}) = m (\log(2 \pi) - 1) - m \log\left( \frac{1}{m} \sum_{j=1}^{m} \frac{I(\lambda_j)}{g(\lambda_j, \vect{\theta})}\right) - \sum_{j=1}^{m} g(\lambda_j, \vect{\theta})

The Whitle estimation requires that:

  • the determinant of the eigenvalues of the companion matrix associated to the polynomial 1 + a_1X + \dots + a_pX^p are outside the unit disc,, which guarantees the stationarity of the process;

  • the determinant of the eigenvalues of the companion matrix associated to the polynomial 1 + ba_1X + \dots + b_qX^q are outside the unit disc, which guarantees the invertibility of the process.

Multivariate estimation

Let (t_i, \vect{X}_i)_{0\leq i \leq n-1} be a multivariate time series of dimension d generated by an ARMA process where (p,q) are supposed to be known. We assume that the white noise \varepsilon is distributed according to the normal distribution with zero mean and with covariance matrix \mat{\Sigma}_{\varepsilon} = \sigma^2\mat{Q} where |\mat{Q}| = 1 .
The normality of the white noise implies the normality of the process. If we note \vect{W} = (\vect{X}_0, \hdots, \vect{X}_{n-1}), then \vect{W} is normal with zero mean. Its covariance matrix writes \mathbb{E}(\vect{W}\vect{W}^t)= \sigma^2 \Sigma_{\vect{W}} which depends on the coefficients (\mat{A}_k, \mat{B}_l) for k = 1,\ldots,p and l = 1,\ldots, q and on the matrix \mat{Q}.

The likelihood of \vect{W} writes:

(5)L(\vect{\beta}, \sigma^2 | \vect{W}) = (2 \pi \sigma^2) ^{-\frac{d n}{2}} |\Sigma_{w}|^{-\frac{1}{2}} \exp\left(- (2\sigma^2)^{-1}  \vect{W}^t \Sigma_{\vect{W}}^{-1}  \vect{W} \right)

where \vect{\beta} = (\mat{A}_{k}, \mat{B}_{l}, \mat{Q}),\ k = 1,\ldots,p, l = 1,\ldots, q and where |.| denotes the determinant.

The difficulty arises from the great size (dn \times dn) of \Sigma_{\vect{W}} which is a dense matrix in the general case. [mauricio1995] proposes an efficient algorithm to evaluate the likelihood function. The main point is to use a change of variable that leads to a block-diagonal sparse covariance matrix.

The multivariate Whittle estimation requires that:

  • the determinant of the eigenvalues of the companion matrix associated to the polynomial \mat{I} + \mat{A}_1\mat{X} + \dots + \mat{A}_p\mat{X}^p are outside the unit disc, which guarantees the stationarity of the process;

  • the determinant of the eigenvalues of the companion matrix associated to the polynomial \mat{I} + \mat{B}_1\mat{X} + \dots + \mat{B}_q\mat{X}^q are outside the unit disc, which guarantees the invertibility of the process.

API: