Empirical cumulative distribution function¶

The empirical cumulative distribution function provides a graphical representation of the probability distribution of a random vector without implying any prior assumption concerning the form of this distribution. It concerns a non-parametric approach which enables the description of complex behavior not necessarily detected with parametric approaches.

Therefore, using general notation, this means that we are looking for an estimator $\widehat{F}_N$ for the cumulative distribution function $F_{X}$ of the random variable $\vect{X} = \left( X^1,\ldots,X^{n_X} \right)$ :

$\begin{aligned} \widehat{F}_N \leftrightarrow F_{X} \end{aligned}$

Let us first consider the uni-dimensional case, and let us denote $\vect{X} = X^1 = X$ . The empirical probability distribution is the distribution created from a sample of observed values $\left\{x_1, x_2, \ldots, x_N\right\}$ . It corresponds to a discrete uniform distribution on $\left\{x_1, x_2, \ldots, x_N\right\}$ : where $X'$ follows this distribution,

$\begin{aligned} \forall \; i \in \left\{1,\ldots, N\right\} ,\ \textrm{Pr}\left(X'=x_i\right) = \frac{1}{N} \end{aligned}$

The empirical cumulative distribution function $\widehat{F}_N$ with this distribution is constructed as follows:

$\begin{aligned} F_N(x) = \frac{1}{N} \sum_{i=1}^N \mathbf{1}_{ \left\{ x_i \leq x \right\} } \end{aligned}$

The empirical cumulative distribution function $F_N(x)$ is defined as the proportion of observations that are less than (or equal to) $x$ and is thus an approximation of the cumulative distribution function $F_X(x)$ which is the probability that an observation is less than (or equal to) $x$ .

$\begin{aligned} F_X(x) = \textrm{Pr} \left( X \leq x \right) \end{aligned}$

The diagram below provides an illustration of an ordered sample $\left\{5,6,10,22,27\right\}$ .

(Source code, png, hires.png, pdf)

The method is similar for the case $n_X>1$ . The empirical probability distribution is a distribution created from a sample $\left\{\vect{x}_1, \vect{x}_2, \ldots, \vect{x}_N\right\}$ . It corresponds to a discrete uniform distribution on $\left\{\vect{x}_1, \vect{x}_2, \ldots, \vect{x}_N\right\}$ : where $\vect{X}'$ follows this distribution,

$\begin{aligned} \forall \; i \in \left\{1,\ldots, N\right\} ,\ \textrm{Pr}\left(\vect{X}'=\vect{x}_i\right) = \frac{1}{N} \end{aligned}$

Thus we have:

$\begin{aligned} F_N(\vect{x}) = \frac{1}{N} \sum_{i=1}^N \mathbf{1}_{ \left\{ x^1_i \leq x^1,\ldots,x^{n_X}_N \leq x^{n_X} \right\} } \end{aligned}$

in comparison with the theoretical probability density function $F_X$ :

$\begin{aligned} F_X(x) = \Prob{X^1 \leq x^1,\ldots,X^{n_X} \leq x^{n_X}} \end{aligned}$

This method is also referred to in the literature as the empirical distribution function.

OpenTURNS

An Open source initiative for the Treatment of Uncertainties, Risks'N Statistics

Previous topic

Next topic

This Page

Empirical cumulative distribution function¶