Graphical goodness-of-fit tests¶
This method deals with the modelling of a probability distribution of a random vector . It seeks to verify the compatibility between a sample of data and a candidate probability distribution previous chosen. The use of graphical tools allows to answer this question in the one dimensional case , and with a continuous distribution. The QQ-plot, and henry line tests are defined in the case to . Thus we denote . The first graphical tool provided is a QQ-plot (where “QQ” stands for “quantile-quantile”). In the specific case of a Normal distribution, Henry’s line may also be used.
A QQ-Plot is based on the notion of quantile. The -quantile of , where , is defined as follows:
If a sample of is available, the quantile can be estimated empirically:
the sample is first placed in ascending order, which gives the sample ;
then, an estimate of the -quantile is:
where denotes the integral part of .
Thus, the smallest value of the sample is an estimate of the -quantile where ().
Let us then consider the candidate probability distribution being tested, and let us denote by its cumulative distribution function. An estimate of the -quantile can be also computed from :
If is really the cumulative distribution function of , then and should be close. Thus, graphically, the points should be close to the diagonal.
The following figure illustrates the principle of a QQ-plot with a sample of size . Note that the unit of the two axis is that of the variable studied; the quantiles determined via are called here “value of ”. In this example, the points remain close to the diagonal and the hypothesis “ is the cumulative distribution function of ” does not seem irrelevant, even if a more quantitative analysis (see for instance ) should be carried out to confirm this.
In this second example, the candidate distribution function is clearly irrelevant.
This second graphical tool is only relevant if the candidate distribution function being tested is gaussian. It also uses the ordered sample introduced for the QQ-plot, and the empirical cumulative distribution function presented in .
Then, let us denote by the cumulative distribution function of a Normal distribution with mean 0 and standard deviation 1. The quantity is defined as follows:
If is distributed according to a normal probability distribution with mean and standard-deviation , then the points should be close to the line defined by . This comes from a property of a normal distribution: it the distribution of is really , then the distribution of is .
The following figure illustrates the principle of Henry’s graphical test with a sample of size . Note that only the unit of the horizontal axis is that of the variable studied. In this example, the points remain close to a line and the hypothesis “the distribution function of is a Gaussian one” does not seem irrelevant, even if a more quantitative analysis (see for instance ) should be carried out to confirm this.
In this example the test validates the hypothesis of a gaussian distribution.
In this second example, the hypothesis of a gaussian distribution seems far less relevant because of the behavior for small values of .
In the bivariate case, the Kendall Plot test enables to validate the choice of a specific copula model or to verify that two samples share the same copula model.
Let be a bivariate random vector which copula is noted . Let be a sample of .
and the ordered statistics of .
The statistic is defined by:
where is the cumulative density function of . We can show that this is the cumulative density function of the random variate when and are independent and follow distributions.
The Kendall Plot test validates the use of the Frank copula for a sample.
The Kendall Plot test invalidates the use of the Frank copula for a sample.
Remark: In the case where you want to test a sample with respect to a specific copula, if the size of the sample is superior to 500, we recommend to use the second form of the Kendall plot test: generate a sample of the proper size from your copula and then test both samples. This way of doing is more efficient.