Chi-squared test for independenceΒΆ
The test can be used to detect dependencies between two random discrete variables.
Let be a random variable of dimension 2 with values in .
We want to test whether has independent components.
Let be i.i.d. random variables following the distribution of . Two test statistics can be defined by:
where:
be the number of pairs equal to ,
be the number of pairs such that the first component is equal to ,
be the number of pairs such that the second component is equal to .
Let be the realization of the test statistic on the sample with .
Under the null hypothesis , the distribution of both test statistics is asymptotically known: i.e. when : this is the distribution. If is sufficiently large, we can use the asymptotic distribution to apply the test as follows.
We fix a risk (error type I) and we evaluate the associated critical value which is the quantile of order of .
Then a decision is made, either by comparing the test statistic to the theoretical threshold (or equivalently by evaluating the p-value of the sample defined as and by comparing it to ):
if (or equivalently ), then we reject the independence between the components,
if (or equivalently ), then the independence between the components is considered acceptable.