Chi-squared goodness of fit test¶
This method deals with the modelling of a probability distribution of a random vector . It seeks to verify the compatibility between a sample of data and a candidate probability distribution previous chosen. The use of the Goodness-of-Fit test allows one to answer this question in the one dimensional case , and with a discrete distribution.
Let us limit the case to . Thus we denote . We also note that as we are considering discrete distributions i.e. those for which the possible values of belong to a discrete set , the candidate distribution is characterized by the probabilities .
where denotes the elements of which have been observed at least once in the data sample and where denotes the number of data values in the sample that are equal to .
if , we reject the candidate distribution with a risk of error ,
if , the candidate distribution is considered acceptable.
An important notion is the so-called “-value” of the test. This quantity is equal to the limit error probability under which the candidate distribution is rejected. Thus, the candidate distribution will be accepted if and only if is greater than the value desired by the user. Note that the higher , the more robust the decision.