.. _graphical_fitting_test: Graphical goodness-of-fit tests ------------------------------- This method deals with the modelling of a probability distribution of a random vector :math:\vect{X} = \left( X^1,\ldots,X^{n_X} \right). It seeks to verify the compatibility between a sample of data :math:\left\{ \vect{x}_1,\vect{x}_2,\ldots,\vect{x}_N \right\} and a candidate probability distribution previous chosen. The use of graphical tools allows to answer this question in the one dimensional case :math:n_X =1, and with a continuous distribution. The QQ-plot, and henry line tests are defined in the case to :math:n_X = 1. Thus we denote :math:\vect{X} = X^1 = X. The first graphical tool provided is a QQ-plot (where “QQ” stands for “quantile-quantile”). In the specific case of a Normal distribution, Henry’s line may also be used. **QQ-plot** A QQ-Plot is based on the notion of quantile. The :math:\alpha-quantile :math:q_{X}(\alpha) of :math:X, where :math:\alpha \in (0, 1), is defined as follows: .. math:: \begin{aligned} \Prob{ X \leq q_{X}(\alpha)} = \alpha \end{aligned} If a sample :math:\left\{x_1,\ldots,x_N \right\} of :math:X is available, the quantile can be estimated empirically: #. the sample :math:\left\{x_1,\ldots,x_N \right\} is first placed in ascending order, which gives the sample :math:\left\{ x_{(1)},\ldots,x_{(N)} \right\}; #. then, an estimate of the :math:\alpha-quantile is: .. math:: \begin{aligned} \widehat{q}_{X}(\alpha) = x_{([N\alpha]+1)} \end{aligned} where :math:[N\alpha] denotes the integral part of :math:N\alpha. Thus, the :math:j^\textrm{th} smallest value of the sample :math:x_{(j)} is an estimate :math:\widehat{q}_{X}(\alpha) of the :math:\alpha-quantile where :math:\alpha = (j-1)/N (:math:1 < j \leq N). Let us then consider the candidate probability distribution being tested, and let us denote by :math:F its cumulative distribution function. An estimate of the :math:\alpha-quantile can be also computed from :math:F: .. math:: \begin{aligned} \widehat{q}'_{X}(\alpha) = F^{-1} \left( (j-1)/N \right) \end{aligned} If :math:F is really the cumulative distribution function of :math:F, then :math:\widehat{q}_{X}(\alpha) and :math:\widehat{q}'_{X}(\alpha) should be close. Thus, graphically, the points :math:\left\{ \left( \widehat{q}_{X}(\alpha),\widehat{q}'_{X}(\alpha)\right),\ \alpha = (j-1)/N,\ 1 < j \leq N \right\} should be close to the diagonal. The following figure illustrates the principle of a QQ-plot with a sample of size :math:N=50. Note that the unit of the two axis is that of the variable :math:X studied; the quantiles determined via :math:F are called here “value of :math:T”. In this example, the points remain close to the diagonal and the hypothesis “:math:F is the cumulative distribution function of :math:X” does not seem irrelevant, even if a more quantitative analysis (see for instance ) should be carried out to confirm this. .. plot:: import openturns as ot from openturns.viewer import View ot.RandomGenerator.SetSeed(0) distribution = ot.Normal(3.0, 2.0) sample = distribution.getSample(150) graph = ot.VisualTest.DrawQQplot(sample, distribution) View(graph) In this second example, the candidate distribution function is clearly irrelevant. .. plot:: import openturns as ot from openturns.viewer import View ot.RandomGenerator.SetSeed(0) distribution = ot.Normal(3.0, 3.0) distribution2 = ot.Normal(2.0, 1.0) sample = distribution.getSample(150) graph = ot.VisualTest.DrawQQplot(sample, distribution2) View(graph) **Henry’s line** This second graphical tool is only relevant if the candidate distribution function being tested is gaussian. It also uses the ordered sample :math:\left\{ x_{(1)},\ldots,x_{(N)} \right\} introduced for the QQ-plot, and the empirical cumulative distribution function :math:\widehat{F}_N presented in . By definition, .. math:: \begin{aligned} x_{(j)} = \widehat{F}_N^{-1} \left( \frac{j}{N} \right) \end{aligned} Then, let us denote by :math:\Phi the cumulative distribution function of a Normal distribution with mean 0 and standard deviation 1. The quantity :math:t_{(j)} is defined as follows: .. math:: \begin{aligned} t_{(j)} = \Phi^{-1} \left( \frac{j}{N} \right) \end{aligned} If :math:X is distributed according to a normal probability distribution with mean :math:\mu and standard-deviation :math:\sigma, then the points :math:\left\{ \left( x_{(j)},t_{(j)} \right),\ 1 \leq j \leq N \right\} should be close to the line defined by :math:t = (x-\mu) / \sigma. This comes from a property of a normal distribution: it the distribution of :math:X is really :math:\cN(\mu,\sigma), then the distribution of :math:(X-\mu) / \sigma is :math:\cN(0,1). The following figure illustrates the principle of Henry’s graphical test with a sample of size :math:N=50. Note that only the unit of the horizontal axis is that of the variable :math:X studied. In this example, the points remain close to a line and the hypothesis “the distribution function of :math:X is a Gaussian one” does not seem irrelevant, even if a more quantitative analysis (see for instance ) should be carried out to confirm this. .. plot:: import openturns as ot from openturns.viewer import View ot.RandomGenerator.SetSeed(0) distribution = ot.Normal(10.0, 2.0) sample = distribution.getSample(50) graph = ot.VisualTest.DrawHenryLine(sample) View(graph) In this example the test validates the hypothesis of a gaussian distribution. .. plot:: import openturns as ot from openturns.viewer import View ot.RandomGenerator.SetSeed(0) distribution = ot.LogNormal(2.0, 1.0, 0.0) sample = distribution.getSample(50) graph = ot.VisualTest.DrawHenryLine(sample) View(graph) In this second example, the hypothesis of a gaussian distribution seems far less relevant because of the behavior for small values of :math:X. **Kendall plot** In the bivariate case, the Kendall Plot test enables to validate the choice of a specific copula model or to verify that two samples share the same copula model. Let :math:\vect{X} be a bivariate random vector which copula is noted :math:C. Let :math:(\vect{X}^i)_{1 \leq i \leq N} be a sample of :math:\vect{X}. We note: .. math:: \begin{aligned} \forall i \geq 1, \displaystyle H_i = \frac{1}{n-1} Card \left\{ j \in [1,N], j \neq i, \, | \, x^j_1 \leq x^i_1 \mbox{ and } x^j_2 \leq x^i_2 \right \} \end{aligned} and :math:(H_{(1)}, \dots, H_{(N)}) the ordered statistics of :math:(H_1, \dots, H_N). The statistic :math:W_i is defined by: .. math:: :label: Wi W_i = N C_{N-1}^{i-1} \int_0^1 t K_0(t)^{i-1} (1-K_0(t))^{n-i} \, dK_0(t) where :math:K_0(t) is the cumulative density function of :math:H_i. We can show that this is the cumulative density function of the random variate :math:C(U,V) when :math:U and :math:V are independent and follow :math:Uniform(0,1) distributions. | Equation :eq:Wi is evaluated with the Monte Carlo sampling method : it generates :math:n samples of size :math:N from the bivariate copula :math:C, in order to have :math:n realizations of the statistics :math:H_{(i)},\forall 1 \leq i \leq N and have an estimation of :math:W_i = E[H_{(i)}], \forall i \leq N. | When testing a specific copula with respect to a sample, the Kendall Plot test draws the points :math:(W_i, H_{(i)})_{1 \leq i \leq N}. If the points are one the first diagonal, the copula model is validated. | When testing whether two samples have the same copula, the Kendall Plot test draws the points :math:(H^1_{(i)}, H^2_{(i)})_{1 \leq i \leq N} respectively associated to the first and second sample. Note that the two samples must have the same size. .. plot:: import openturns as ot from openturns.viewer import View ot.RandomGenerator.SetSeed(0) copula = ot.FrankCopula(1.5) sample = copula.getSample(100) graph = ot.VisualTest.DrawKendallPlot(sample, copula) View(graph) The Kendall Plot test validates the use of the Frank copula for a sample. .. plot:: import openturns as ot from openturns.viewer import View ot.RandomGenerator.SetSeed(0) copula = ot.FrankCopula(1.5) copula2 = ot.GumbelCopula(4.5) sample = copula.getSample(100) graph = ot.VisualTest.DrawKendallPlot(sample, copula2) View(graph) The Kendall Plot test invalidates the use of the Frank copula for a sample. Remark: In the case where you want to test a sample with respect to a specific copula, if the size of the sample is superior to 500, we recommend to use the second form of the Kendall plot test: generate a sample of the proper size from your copula and then test both samples. This way of doing is more efficient. .. topic:: API: - See :py:func:~openturns.VisualTest_DrawQQplot to draw a QQ plot - See :py:func:~openturns.VisualTest_DrawHenryLine to draw the Henry line - See :py:func:~openturns.VisualTest_DrawKendallPlot to draw the Kendall plot .. topic:: Examples: - See :doc:/auto_data_analysis/statistical_tests/plot_qqplot_graph - See :doc:/auto_data_analysis/statistical_tests/plot_test_normality - See :doc:/auto_data_analysis/statistical_tests/plot_test_copula .. topic:: References: - [saporta1990]_ - [dixon1983]_