Chi-squared test for independence¶

This method deals with the parametric modelling of a probability distribution for a random vector $\vect{X} = \left( X^1,\ldots,X^{n_X} \right)$ . We seek here to detect possible dependencies that may exist between two components $X^i$ and $X^j$ . The $\chi^2$ test for Independence for discrete probability distributions can be used.

As we are considering discrete distributions, the possible values for $X^i$ and $X^j$ respectively belong to the discrete sets $\cE_i$ and $\cE_j$ . The $\chi^2$ test of independence can be applied when we have a sample consisting of $N$ pairs $\left\{ (x^i_1,x^j_1),(x^i_2,x^j_2),(x^i_N,x^j_N) \right\}$ . We denote:

$n_{u,v}$ the number of pairs in the sample such that $x^i_k = u$ and $x^j_k = v$ ,
$n^i_{u}$ the number of pairs in the sample such that $x^i_k = u$ ,
$n^j_{v}$ the number of pairs in the sample such that $x^j_k = v$ .

The test thus uses the quantity denoted $\widehat{D}_N^2$ :

$\begin{aligned} \widehat{D}_N^2 = \sum_{u \in \cE_i}\sum_{v\in \cE_2}\frac{\left(p_{u,v} - p^j_{v}p^i_{u}\right)^2}{p^i_{u}p^j_{v}} \end{aligned}$

where:

$\begin{aligned} p_{u,v} = \frac{n_{u,v}}{N},\ p^i_{u} = \frac{n^i_{u}}{N},\ p^j_{v} = \frac{n^j_{v}}{N} \end{aligned}$

The probability distribution of the distance $\widehat{D}_N^2$ is asymptotically known (i.e. as the size of the sample tends to infinity). If $N$ is sufficiently large, this means that for a probability $\alpha$ , one can calculate the threshold (critical value) $d_\alpha$ such that:

if $\widehat{D}_N>d_{\alpha}$ , we conclude, with the risk of error $\alpha$ , that a dependency exists between $X^i$ and $X^j$ ,
if $\widehat{D}_N \leq d_{\alpha}$ , the independence hypothesis is considered acceptable.

An important notion is the so-called “ $p$ -value” of the test. This quantity is equal to the limit error probability $\alpha_\textrm{lim}$ under which the independence hypothesis is rejected. Thus, independence is assumed if and only if $\alpha_\textrm{lim}$ is greater than the value $\alpha$ desired by the user. Note that the higher $\alpha_\textrm{lim} - \alpha$ , the more robust the decision.

This method is also referred to in the literature as the $\chi^2$ test of contingency.

API:

See HypothesisTest_ChiSquared

Examples:

See Test independence

References:

OpenTURNS

An Open source initiative for the Treatment of Uncertainties, Risks'N Statistics

Previous topic

Next topic

This Page

Chi-squared test for independence¶