Chi-squared goodness of fit test¶

This method deals with the modelling of a probability distribution of a random vector $\vect{X} = \left( X^1,\ldots,X^{n_X} \right)$ . It seeks to verify the compatibility between a sample of data $\left\{ \vect{x}_1,\vect{x}_2,\ldots,\vect{x}_N \right\}$ and a candidate probability distribution previous chosen. The use of the $\chi^2$ Goodness-of-Fit test allows to answer this question in the one dimensional case $n_X =1$ , and with a discrete distribution.

Let us limit the case to $n_X = 1$ . Thus we denote $\vect{X} = X^1 = X$ . We also note that as we are considering discrete distributions i.e. those for which the possible values of $X$ belong to a discrete set $\cE$ , the candidate distribution is characterized by the probabilities $\left\{ p(x;\vect{\theta}) \right\}_{x \in \cE}$ .

The chi squared test is based on the fact that if the candidate distribution is appropriate, the number of values in the sample x1, x2, …, xN that are equal to $x$ should be on average equal to $N p(x;\vect{\theta})$ . The idea is therefore to compare the “theoretical values” with the actual observed values. This comparison is performed with the aid of the following “distance”.

$\begin{aligned} \widehat{D}_N^2 = \sum_{x \in \cE_N} \frac{\left(Np(x)-n(x)\right)^2}{n(x)} \end{aligned}$

where $\cE_N$ denotes the elements of $\cE$ which have been observed at least once in the data sample and where $n(x)$ denotes the number of data values in the sample that are equal to $x$ .

The probability distribution of the distance $\widehat{D}_N^2$ is asymptotically known (i.e. as the size of the sample tends to infinity), and this asymptotic distribution does not depend on the candidate distribution being tested. If $N$ is sufficiently large, this means that for a probability $\alpha$ , one can calculate the threshold / critical value) $d_\alpha$ such that:

if $\widehat{D}_N>d_{\alpha}$ , we reject the candidate distribution with a risk of error $\alpha$ ,
if $\widehat{D}_N \leq d_{\alpha}$ , the candidate distribution is considered acceptable.

An important notion is the so-called “ $p$ -value” of the test. This quantity is equal to the limit error probability $\alpha_\textrm{lim}$ under which the candidate distribution is rejected. Thus, the candidate distribution will be accepted if and only if $\alpha_\textrm{lim}$ is greater than the value $\alpha$ desired by the user. Note that the higher $\alpha_\textrm{lim} - \alpha$ , the more robust the decision.

API:

See FittingTest_ChiSquared()

Examples:

See Distribution fitting test using Chi2

References:

OpenTURNS

An Open source initiative for the Treatment of Uncertainties, Risks'N Statistics

Previous topic

Next topic

This Page

Chi-squared goodness of fit test¶