Chi-squared test for independence¶

This method deals with the parametric modelling of a probability distribution for a random vector . We seek here to detect possible dependencies that may exist between two components and . The test for Independence for discrete probability distributions can be used.

As we are considering discrete distributions, the possible values for and respectively belong to the discrete sets and . The test of independence can be applied when we have a sample consisting of pairs . We denote:

• the number of pairs in the sample such that and ,

• the number of pairs in the sample such that ,

• the number of pairs in the sample such that .

The test thus uses the quantity denoted :

where:

The probability distribution of the distance is asymptotically known (i.e. as the size of the sample tends to infinity). If is sufficiently large, this means that for a probability , one can calculate the threshold (critical value) such that:

• if , we conclude, with the risk of error , that a dependency exists between and ,

• if , the independence hypothesis is considered acceptable.

An important notion is the so-called “-value” of the test. This quantity is equal to the limit error probability under which the independence hypothesis is rejected. Thus, independence is assumed if and only if is greater than the value desired by the user. Note that the higher , the more robust the decision.

This method is also referred to in the literature as the test of contingency.

Examples: