Using QQ-plot to compare two samples¶
Let be a scalar uncertain variable modeled as a random variable. This method deals with the construction of a dataset prior to the choice of a probability distribution for . A QQ-plot (where “QQ” stands for “quantile-quantile”) is a tool that may be used to compare two samples and ; the goal is to determine graphically whether these two samples come from the same probability distribution or not. If this is the case, the two samples should be aggregated in order to increase the robustness of further statistical analysis.
A QQ-plot is based on the notion of quantile. The -quantile of , where , is defined as follows:
If a sample of is available, the quantile can be estimated empirically:
the sample is first placed in ascending order, which gives the sample ;
then, an estimate of the -quantile is:
where denotes the integral part of .
Thus, the smallest value of the sample is an estimate of the -quantile where (). Let us then consider our second sample ; this one also provides an estimate of this same quantile:
If both samples correspond to the same probability distribution, then and should be close. Thus, graphically, the points should be close to the diagonal.
The following figure illustrates the principle of a QQ-plot with two samples of size and . Note that the unit of the two axis is that of the variable studied. In this example, the points remain close to the diagonal and the hypothesis “the two samples come from the same distribution” does not seem irrelevant, even if a more quantitative analysis should be carried out to confirm this.
(Source code, png, hires.png, pdf)
(Source code, png, hires.png, pdf)
In this second example, the two samples clearly arise from two different distributions.