.. _estimate_spectral_density_function:

Estimation of a spectral density function
==========================================

| Let :math:`X: \Omega \times \cD \rightarrow \Rset^d` be a multivariate
  stationary normal process of dimension :math:`d`. We only treat here
  the case where the domain is of dimension 1: :math:`\cD \in \Rset`
  (:math:`n=1`).
| If the process is continuous, then :math:`\cD=\Rset`. In the discrete
  case, :math:`\cD` is a lattice.
| :math:`X` is supposed to be a second order process with zero mean and
  we suppose that its spectral density function
  :math:`S : \Rset \rightarrow \mathcal{H}^+(d)` defined in
  :eq:`specdensFunc` exists.
  :math:`\mathcal{H}^+(d) \in \mathcal{M}_d(\Cset)` is the set of
  :math:`d`-dimensional positive definite hermitian matrices.
| This objective of this use case is to estimate the spectral density
  function :math:`S` from data, which can be a sample of time series or
  one time series.

Depending on the available data, we proceed differently:

-  if the data correspond to several independent realizations of the
   process, a *statistical
   estimate* is performed using statistical average of a
   realization-based estimator;

-  if the data correspond to one realization of the process at different
   time stamps (stored in a *TimeSeries* object), the process being
   observed during a long period of time, an *ergodic estimate* is
   performed using a time average of an ergodic-based estimator.

| The estimation of the spectral density function from data may use some
  parametric or non parametric methods.
| The *Welch* method is a *non parametric* estimation technique, known
  to be performant. We detail it in the case where the available data on
  the process is a time series which values are
  :math:`(\vect{x}_0, \dots,\vect{x}_{N-1})` associated to the time grid
  :math:`(t_0, \dots, t_{N-1})` which is a discretization of the domain
  :math:`[0,T]`.
| We assume that the process has a spectral density :math:`S` defined on
  :math:`| f | \leq \frac{T}{2}`.
| The method is based on the segmentation of the time series into
  :math:`K` segments of length :math:`L`, possibly overlapping (size of
  overlap :math:`R`).
| Let :math:`\vect{X}_{1}(j), \ j = 0, 1,...,L-1` be the first such
  segment. Then:

  .. math:: \vect{X}_{1}(j) = \vect{X}(j) , \ j = 0, 1,...,L-1

  Applying the same decomposition,

  .. math:: \vect{X}_{2}(j) = \vect{X}(j + (L - R)) , \ j = 0, 1,...,L-1

  and finally:

  .. math:: \vect{X}_{K}(j) = \vect{X}(j + (K-1)(L-R)) , \ j = 0, 1,...,L-1

The objective is to get a statistical estimator from these :math:`K`
segments. We define the *periodogram* associated with the segment
:math:`\vect{X}_k` by:

.. math::

   \begin{aligned}
     \vect{X}_{k}(f_p,T)&=&\Delta t\sum_{n=0}^{L-1}\vect{x}(n\Delta t)\exp\left[\frac{-j2\pi pn}{N}\right], \quad p=0,\dots, L-1\\
     \hat{G}_{\vect{x}}(f_p,T)&=&\frac{2}{T}\vect{X}_{k}(f_p,T)\,{\vect{X}_{k}(f_p,T)^*}^t,\quad p=0,\dots,L/2-1\end{aligned}

with :math:`\Delta t=\frac{T}{N}` and
:math:`f_p=\frac{p}{T}=\frac{p}{N}\frac{1}{\Delta t}`.

| It has been proven that the *periodogram* has bad statistical
  properties. Indeed, two quantities summarize the properties of an
  estimator: its *bias* and its *variance*. The bias is the expected
  error one makes on the average using only a finite number of time
  series of finite length, whereas the covariance is the expected
  fluctuations of the estimator around its mean value. For the
  periodogram, we have:

-  Bias\ :math:`=\mathbb{E}[\hat{G}_{\vect{x}}(f_p, T)-G_{\vect{X}}(f_p)]=(\frac{1}{T}W_B(f_p, T)-\delta_0)*G_{\vect{X}}(f_p)`
   where :math:`W_B(f_p, T) = \left(\frac{\sin\pi fT}{\pi fT}\right)^2`
   is the squared module of the Fourier transform of the function
   :math:`w_B(t, T)` (*Barlett window*) defined by:

   .. math:: w_B(t, T) = \mathbf{1}_{[0,T]}(t)

   This estimator is *biased* but this bias vanishes when
   :math:`T\rightarrow\infty` as
   :math:`\lim_{T\rightarrow\infty} \frac{1}{T}W_B(f_p, T)=\delta_0`.

-  Covariance\ :math:`=\frac{1}{T}W_B(f_p, T)*G_{\vect{X}}(f_p)\rightarrow G_{\vect{X}}(f_p)`
   as :math:`T\rightarrow\infty`, which means that the fluctuations of
   the periodogram are of the same order of magnitude as the quantity to
   be estimated and thus the estimator is not convergent.

The periodogram’s lack of convergence may be easily fixed if we consider
the *averaged periodogram* over :math:`K` independent time series or
segments:

.. math:: \hat{G}_{\vect{x}}(f_p,T)=\frac{2}{KT}\sum_{k=0}^{K-1}\vect{X}^{(k)}(f_p,T)\vect{X}^{(k)}(f_p,T)^t

The averaging process has no effect on the significant bias of the
periodogram.

The use of a *tapering window* :math:`w(t, T)` may significantly reduce
it. The time series :math:`\vect{x}(t)` is replaced by a tapered time
series :math:`w(t, T)\vect{x}(t)` in the computation of
:math:`\vect{X}(f_p,T)`. One gets:

.. math:: \mathbb{E}[\hat{G}_{\vect{x}}(f_p, T)-G_{\vect{X}}(f_p)=(\frac{1}{T}W(f_p, T)-\delta_0)*G_{\vect{X}}(f_p)

where :math:`W(f_p, T)` is the square module of the Fourier transform
of :math:`w(t, T)` at the frequency :math:`f_p`. A judicious choice of
tapering function such as the *Hanning window* :math:`w_H` can
dramatically reduce the bias of the estimate:

.. math::
  :label: HamEff

    w_H(t, T) = \sqrt{\frac{8}{3}}\left(1-\cos^2\left(\frac{\pi t}{T}\right)\right)\mathbf{1}_{[0,T]}(t)

The library builds an estimation of the spectrum on a *TimeSeries* by
fixing the number of segments, the *overlap* size parameter and a
*FilteringWindows*. The available ones are:

-  The *Hamming* window

   .. math:: w(t, T) = \sqrt{\frac{1}{K}}\left(0.54-0.46\cos^2\left(\frac{2 \pi t}{T}\right)\right)\mathbf{1}_{[0,T]}(t)

   with :math:`K` = :math:`\sqrt{0.54^2 + \frac{1}{2} 0.46^2}`

-  The *Hanning* window described in :eq:`HamEff` which is supposed to be
   the most useful.

.. topic:: API:

    - See :class:`~openturns.Hanning`
    - See :class:`~openturns.Hamming`
    - See :class:`~openturns.WelchFactory`

.. topic:: Examples:

    - See :doc:`/auto_data_analysis/estimate_stochastic_processes/plot_estimate_spectral_density_function`