Graphical goodness-of-fit tests¶
This method deals with the modelling of a probability distribution of a
random vector . It
seeks to verify the compatibility between a sample of data
and a
candidate probability distribution previous chosen.
The use of graphical tools allows one to answer this question in the one
dimensional case
, and with a continuous distribution.
The QQ-plot, and henry line tests are defined in the case to
. Thus we denote
. The first
graphical tool provided is a QQ-plot (where “QQ” stands
for “quantile-quantile”). In the specific case of a Normal distribution,
Henry’s line may also be used.
QQ-plot
A QQ-Plot is based on the notion of quantile. The
-quantile
of
, where
, is defined as follows:
If a sample of
is
available, the quantile can be estimated empirically:
the sample
is first placed in ascending order, which gives the sample
;
then, an estimate of the
-quantile is:
where denotes the integral part of
.
Thus, the smallest value of the sample
is an estimate
of the
-quantile where
(
).
Let us then consider the candidate probability distribution being
tested, and let us denote by its cumulative distribution
function. An estimate of the
-quantile can be also
computed from
:
If is really the cumulative distribution function of
, then
and
should be close. Thus, graphically, the
points
should be close to the diagonal.
The following figure illustrates the principle of a QQ-plot with a
sample of size . Note that the unit of the two axis is that
of the variable
studied; the quantiles determined via
are called here “value of
”. In this example, the
points remain close to the diagonal and the hypothesis “
is the
cumulative distribution function of
” does not seem irrelevant,
even if a more quantitative analysis (see for instance ) should be
carried out to confirm this.
(Source code
, png
)

In this second example, the candidate distribution function is clearly irrelevant.
(Source code
, png
)

Henry’s line
This second graphical tool is only relevant if the candidate
distribution function being tested is gaussian. It also uses the ordered
sample introduced for
the QQ-plot, and the empirical cumulative distribution function
presented in .
By definition,
Then, let us denote by the cumulative distribution
function of a Normal distribution with mean 0 and standard deviation 1.
The quantity
is defined as follows:
If is distributed according to a normal probability
distribution with mean
and standard-deviation
, then the points
should be close to the line defined by
.
This comes from a property of a normal distribution: it the distribution
of
is really
, then the distribution of
is
.
The following figure illustrates the principle of Henry’s graphical test
with a sample of size . Note that only the unit of the
horizontal axis is that of the variable
studied. In this
example, the points remain close to a line and the hypothesis “the
distribution function of
is a Gaussian one” does not seem
irrelevant, even if a more quantitative analysis (see for instance )
should be carried out to confirm this.
(Source code
, png
)

In this example the test validates the hypothesis of a gaussian distribution.
(Source code
, png
)

In this second example, the hypothesis of a gaussian distribution seems
far less relevant because of the behavior for small values of
.
Kendall plot
In the bivariate case, the Kendall Plot test enables to validate the choice of a specific copula model or to verify that two samples share the same copula model.
Let be a bivariate random vector which copula is
noted
.
Let
be a sample of
.
We note:
and the ordered statistics of
.
The statistic is defined by:
(1)¶
where is the cumulative density function of
. We can show that this is the cumulative density function
of the random variate
when
and
are
independent and follow
distributions.
(Source code
, png
)

The Kendall Plot test validates the use of the Frank copula for a sample.
(Source code
, png
)

The Kendall Plot test invalidates the use of the Frank copula for a sample.
Remark: In the case where you want to test a sample with respect to a specific copula, if the size of the sample is superior to 500, we recommend to use the second form of the Kendall plot test: generate a sample of the proper size from your copula and then test both samples. This way of doing is more efficient.