# Graphical goodness-of-fit tests¶

This method deals with the modelling of a probability distribution of a random vector . It seeks to verify the compatibility between a sample of data and a candidate probability distribution previous chosen. The use of graphical tools allows one to answer this question in the one dimensional case , and with a continuous distribution. The QQ-plot, and henry line tests are defined in the case to . Thus we denote . The first graphical tool provided is a QQ-plot (where “QQ” stands for “quantile-quantile”). In the specific case of a Normal distribution, Henry’s line may also be used.

**QQ-plot**

A QQ-Plot is based on the notion of quantile. The -quantile of , where , is defined as follows:

If a sample of is available, the quantile can be estimated empirically:

the sample is first placed in ascending order, which gives the sample ;

then, an estimate of the -quantile is:

where denotes the integral part of .

Thus, the smallest value of the sample is an estimate of the -quantile where ().

Let us then consider the candidate probability distribution being tested, and let us denote by its cumulative distribution function. An estimate of the -quantile can be also computed from :

If is really the cumulative distribution function of , then and should be close. Thus, graphically, the points should be close to the diagonal.

The following figure illustrates the principle of a QQ-plot with a sample of size . Note that the unit of the two axis is that of the variable studied; the quantiles determined via are called here “value of ”. In this example, the points remain close to the diagonal and the hypothesis “ is the cumulative distribution function of ” does not seem irrelevant, even if a more quantitative analysis (see for instance ) should be carried out to confirm this.

(`Source code`

, `png`

)

In this second example, the candidate distribution function is clearly irrelevant.

(`Source code`

, `png`

)

**Henry’s line**

This second graphical tool is only relevant if the candidate distribution function being tested is gaussian. It also uses the ordered sample introduced for the QQ-plot, and the empirical cumulative distribution function presented in .

By definition,

Then, let us denote by the cumulative distribution function of a Normal distribution with mean 0 and standard deviation 1. The quantity is defined as follows:

If is distributed according to a normal probability distribution with mean and standard-deviation , then the points should be close to the line defined by . This comes from a property of a normal distribution: it the distribution of is really , then the distribution of is .

The following figure illustrates the principle of Henry’s graphical test with a sample of size . Note that only the unit of the horizontal axis is that of the variable studied. In this example, the points remain close to a line and the hypothesis “the distribution function of is a Gaussian one” does not seem irrelevant, even if a more quantitative analysis (see for instance ) should be carried out to confirm this.

(`Source code`

, `png`

)

In this example the test validates the hypothesis of a gaussian distribution.

(`Source code`

, `png`

)

In this second example, the hypothesis of a gaussian distribution seems far less relevant because of the behavior for small values of .

**Kendall plot**

In the bivariate case, the Kendall Plot test enables to validate the choice of a specific copula model or to verify that two samples share the same copula model.

Let be a bivariate random vector which copula is noted . Let be a sample of .

We note:

and the ordered statistics of .

The statistic is defined by:

(1)¶

where is the cumulative density function of . We can show that this is the cumulative density function of the random variate when and are independent and follow distributions.

(`Source code`

, `png`

)

The Kendall Plot test validates the use of the Frank copula for a sample.

(`Source code`

, `png`

)

The Kendall Plot test invalidates the use of the Frank copula for a sample.

Remark: In the case where you want to test a sample with respect to a specific copula, if the size of the sample is superior to 500, we recommend to use the second form of the Kendall plot test: generate a sample of the proper size from your copula and then test both samples. This way of doing is more efficient.