Graphical goodness-of-fit tests¶
We gather some graphical tools to validate whether a given sample of data is drawn from a given continuous distribution of dimension 1.
We denote by the data of dimension 1
which have been independently generated by the random variable
.
Let
be a continuous cumulative distribution function.
We want to validate whether follows the distribution characterized by
.
QQ-plot¶
The Quantile - Quantile - Plot (QQ Plot) is based on the comparison of some quantiles
between the tested distribution and the empirical ones. Let be the quantile of order
of the distribution
, with
. It is defined by:
The empirical quantile of order built on the sample is defined by:
where denotes the integral part of
and
is the sample sorted in ascended order:
Thus, the smallest value of the sample
is an estimate
of the
-quantile where
, for
.
The QQ-plot draws the couples
.
If
follows the distribution
, then the points should be close to the diagonal.
The following figure illustrates a QQ-plot with a
sample of size . In this example, the
points remain close to the diagonal and the hypothesis “
is the
cumulative distribution function of
” does not seem false,
even if a more quantitative analysis should be
carried out to confirm this.
(Source code
, png
)

In this second example, the tested continuous distribution is clearly false.
(Source code
, png
)

Normal probability plot (Henry’s line)¶
This test is dedicated to the normal distribution.
The following result is used in the test: if follows the
distribution,
then
follows the
distribution. Furthermore, let
be the quantile of order
of
and let
be the quantile of order
of
. Then we have the relation:
Then the Henri line draws the QQ-plot built from the empirical quantiles of order
and the quantiles of same order of the
distribution. If the sample comes from the
distribution, then the points should be close to the line of equation
.
The following figure illustrates the Henry’s line
with a sample of size . In this
example, the points remain close to a line and the hypothesis “
follows
a normal distribution“ does not seem
false, even if a more quantitative analysis
should be carried out to confirm this.
(Source code
, png
)

In this second example, the hypothesis of a normal distribution seems
far less plausible because of the behavior for small values of
.
(Source code
, png
)

Kendall plot¶
In the bivariate case, the Kendall Plot test allows one to validate whether a sample is drawn from a given copula or to check whether two samples share the same copula.
Let be a bivariate random vector with the copula
and
the marginal cumulative distribution functions
.
Let
be the random vector with
marginal distributions
and
copula.
Let a sample drawn from
. We build the rank sample
defined by
where
.
We define:
where is a bivariate random vector with
marginal distributions and
copula.
We denote by
the cumulative distribution function of
.
We can get a sample of denoted by
from the sample
as follows:
where is the empirical cumulative distribution function
of the sample
.
Then, we have, for all
:
From the sample , we build the ordered sample
.
Let be the order statistics of
.
Then we know that the cumulative distribution function of
is the composition between the cumulative
distribution function of the
distribution and the distribution
of
:
Let be the statistic defined by:
Thus we have:
(1)¶
For a given copula , equation (1) is evaluated by Monte Carlo
sampling: we generate
samples of size
from
, in order to get
realizations of the statistics
that are used to calculate
as the empirical mean of
.
The Kendall Plot draws the points .
If the points are on the first diagonal, the copula
is
validated.
In particular, we can use the Kendall plot to test the independence between
and
by using the independent copula to calculate the values
.
To test whether two samples share the same copula, the Kendall
Plot test draws the points
respectively
associated to the first and second sample. Note that the two samples
must have the same size.
In the first example, the Kendall Plot test validates the use of the Frank copula for the given sample.
(Source code
, png
)

In the second example, the Kendall Plot test invalidates the use of the Frank copula for the given sample.
(Source code
, png
)

Remark: In the case where you want to test a sample with respect to a specific copula, if the size of the sample is greater than 500, we recommend to use the second form of the Kendall plot test: generate a sample of the proper size from your copula and then test both samples. Testing this way is more efficient.