Chi-squared test for independenceΒΆ
The test can be used to detect dependencies between two random discrete variables.
Let be a random variable of dimension 2 with values in
.
We want to test whether has independent components.
Let be i.i.d. random variables following the distribution of
. Two test statistics can be defined by:
where:
be the number of pairs equal to
,
be the number of pairs such that the first component is equal to
,
be the number of pairs such that the second component is equal to
.
Let be the realization of the test statistic
on the sample
with
.
Under the null hypothesis ,
the distribution of both test statistics
is asymptotically
known: i.e. when
: this is
the
distribution.
If
is sufficiently large, we can use the asymptotic distribution to apply
the test as follows.
We fix a risk (error type I) and we evaluate the associated critical value
which is the quantile of order
of
.
Then a decision is made, either by comparing the test statistic to the theoretical threshold
(or equivalently by evaluating the p-value of the sample defined as
and by comparing it to
):
if
(or equivalently
), then we reject the independence between the components,
if
(or equivalently
), then the independence between the components is considered acceptable.