Chi-squared testΒΆ
The test is a statistical test of whether a given sample of data is drawn from a given discrete distribution. The library only provides the test for distributions of dimension 1.
We denote by a sample of dimension 1. Let be the (unknown) cumulative distribution function of the discrete distribution. We want to test whether the sample is drawn from the discrete distribution characterized by the probabilities where is the set of parameters of the distribution and and its support. Let be the cumulative distribution function of this candidate distribution.
This test involves the calculation of the test statistic which is the distance between the empirical number of values equal to in the sample and the theoretical mean one evaluated from the discrete distribution.
Let be i.i.d. random variables following the distribution with CDF . According to the tested distribution , the theoretical mean number of values equal to is whereas the number evaluated from is . Then the test statistic is defined by:
If some values of have not been observed in the sample, we have to gather values in classes so that they contain at least 5 data points (empirical rule). Then the theoretical probabilities of all the values in the class are added to get the theoretical probability of the class.
Let be the realization of the test statistic on the sample . Under the null hypothesis , the distribution of the test statistic is known: this is the distribution, where is the number of distinct values in the support of . We apply the test as follows.
We fix a risk (error type I) and we evaluate the associated critical value which is the quantile of order of . Then a decision is made, either by comparing the test statistic to the theoretical threshold (or equivalently by evaluating the p-value of the sample defined as and by comparing it to ):
if (or equivalently ), then we reject ,
if (or equivalently ), then is considered acceptable.