Graphical goodness-of-fit tests¶
This method deals with the modelling of a probability distribution of a
random vector . It
seeks to verify the compatibility between a sample of data
and a
candidate probability distribution previous chosen.
The use of graphical tools allows to answer this question in the one
dimensional case
, and with a continuous distribution.
The QQ-plot, and henry line tests are defined in the case to
. Thus we denote
. The first
graphical tool provided is a QQ-plot (where “QQ” stands
for “quantile-quantile”). In the specific case of a Normal distribution,
Henry’s line may also be used.
QQ-plot
A QQ-Plot is based on the notion of quantile. The
-quantile
of
, where
, is defined as follows:
If a sample of
is
available, the quantile can be estimated empirically:
the sample
is first placed in ascending order, which gives the sample
;
then, an estimate of the
-quantile is:
where denotes the integral part of
.
Thus, the smallest value of the sample
is an estimate
of the
-quantile where
(
).
Let us then consider the candidate probability distribution being
tested, and let us denote by its cumulative distribution
function. An estimate of the
-quantile can be also
computed from
:
If is really the cumulative distribution function of
, then
and
should be close. Thus, graphically, the
points
should be close to the diagonal.
The following figure illustrates the principle of a QQ-plot with a
sample of size . Note that the unit of the two axis is that
of the variable
studied; the quantiles determined via
are called here “value of
”. In this example, the
points remain close to the diagonal and the hypothesis “
is the
cumulative distribution function of
” does not seem irrelevant,
even if a more quantitative analysis (see for instance ) should be
carried out to confirm this.
(Source code, png, hires.png, pdf)

In this second example, the candidate distribution function is clearly irrelevant.
(Source code, png, hires.png, pdf)

Henry’s line
This second graphical tool is only relevant if the candidate
distribution function being tested is gaussian. It also uses the ordered
sample introduced for
the QQ-plot, and the empirical cumulative distribution function
presented in .
By definition,
Then, let us denote by the cumulative distribution
function of a Normal distribution with mean 0 and standard deviation 1.
The quantity
is defined as follows:
If is distributed according to a normal probability
distribution with mean
and standard-deviation
, then the points
should be close to the line defined by
.
This comes from a property of a normal distribution: it the distribution
of
is really
, then the distribution of
is
.
The following figure illustrates the principle of Henry’s graphical test
with a sample of size . Note that only the unit of the
horizontal axis is that of the variable
studied. In this
example, the points remain close to a line and the hypothesis “the
distribution function of
is a Gaussian one” does not seem
irrelevant, even if a more quantitative analysis (see for instance )
should be carried out to confirm this.
(Source code, png, hires.png, pdf)

In this example the test validates the hypothesis of a gaussian distribution.
(Source code, png, hires.png, pdf)

In this second example, the hypothesis of a gaussian distribution seems
far less relevant because of the behavior for small values of
.
Kendall plot
In the bivariate case, the Kendall Plot test enables to validate the choice of a specific copula model or to verify that two samples share the same copula model.
Let be a bivariate random vector which copula is
noted
.
Let
be a sample of
.
We note:
and the ordered statistics of
.
The statistic is defined by:
(1)¶
where is the cumulative density function of
. We can show that this is the cumulative density function
of the random variate
when
and
are
independent and follow
distributions.
(Source code, png, hires.png, pdf)

The Kendall Plot test validates the use of the Frank copula for a sample.
(Source code, png, hires.png, pdf)

The Kendall Plot test invalidates the use of the Frank copula for a sample.
Remark: In the case where you want to test a sample with respect to a specific copula, if the size of the sample is superior to 500, we recommend to use the second form of the Kendall plot test: generate a sample of the proper size from your copula and then test both samples. This way of doing is more efficient.
API:
See
VisualTest_DrawQQplot()
to draw a QQ plotSee
VisualTest_DrawHenryLine()
to draw the Henry lineSee
VisualTest_DrawKendallPlot()
to draw the Kendall plot
Examples:
See Draw the QQ-Plot
See Test Normality
See Test the copula
References: