Chi-squared test for independence¶
This method deals with the parametric modelling of a probability
distribution for a random vector
. We seek here to
detect possible dependencies that may exist between two components
and
. The
test for Independence
for discrete probability distributions can be used.
As we are considering discrete distributions, the possible values for
and
respectively belong to the discrete sets
and
. The
test of independence
can be applied when we have a sample consisting of
pairs
. We
denote:
the number of pairs in the sample such that
and
,
the number of pairs in the sample such that
,
the number of pairs in the sample such that
.
The test thus uses the quantity denoted :
where:
The probability distribution of the distance is
asymptotically known (i.e. as the size of the sample tends to infinity).
If
is sufficiently large, this means that for a probability
, one can calculate the threshold (critical value)
such that:
if
, we conclude, with the risk of error
, that a dependency exists between
and
,
if
, the independence hypothesis is considered acceptable.
An important notion is the so-called “-value” of the test. This
quantity is equal to the limit error probability
under which the independence hypothesis is
rejected. Thus, independence is assumed if and only if
is greater than the value
desired by the user. Note that the higher
, the more robust the decision.
This method is also referred to in the literature as the
test of contingency.
API:
Examples: