Using QQ-plot to compare two samples¶
Let  be a scalar uncertain variable modeled as a random
variable. This method deals with the construction of a dataset prior to
the choice of a probability distribution for 
. A QQ-plot (where
“QQ” stands for “quantile-quantile”) is a tool that may be used to
compare two samples 
 and
; the goal is to determine
graphically whether these two samples come from the same probability
distribution or not. If this is the case, the two samples should be
aggregated in order to increase the robustness of further statistical
analysis.
A QQ-plot is based on the notion of quantile. The
-quantile 
 of 
, where
, is defined as follows:
If a sample  of 
 is
available, the quantile can be estimated empirically:
- the sample - is first placed in ascending order, which gives the sample - ; 
- then, an estimate of the - -quantile is: 
where  denotes the integral part of
.
Thus, the  smallest value of the sample
 is an estimate 
 of the
-quantile where 
(
). Let us then consider our second sample
; this one also provides an
estimate 
 of this same quantile:
If both samples correspond to the same probability distribution,
then  and
 should be close. Thus, graphically, the
points
should be close to the diagonal.
The following figure illustrates the principle of a QQ-plot with two
samples of size  and 
. Note that the unit of the
two axis is that of the variable 
 studied. In this example, the
points remain close to the diagonal and the hypothesis “the two samples
come from the same distribution” does not seem irrelevant, even if a
more quantitative analysis should be carried out to confirm this.
(Source code, png)
 
(Source code, png)
 
In this second example, the two samples clearly arise from two different distributions.
 OpenTURNS
      OpenTURNS