.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_data_analysis/statistical_tests/plot_test_independence.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_data_analysis_statistical_tests_plot_test_independence.py: Test independence ================= .. GENERATED FROM PYTHON SOURCE LINES 6-11 .. code-block:: Python import openturns as ot ot.Log.Show(ot.Log.NONE) .. GENERATED FROM PYTHON SOURCE LINES 12-59 Sample independence test ------------------------ In this paragraph we perform tests to assess whether two 1-d samples are independent or not. The following tests are available: - the ChiSquared test: it tests if both scalar samples (discrete ones only) are independent. If :math:`n_{ij}` is the number of values of the sample :math:`i=(1,2)` in the modality :math:`1 \leq j \leq m`, :math:`\displaystyle n_{i.} = \sum_{j=1}^m n_{ij}`, :math:`\displaystyle n_{.j} = \sum_{i=1}^2 n_{ij}`, and the ChiSquared test evaluates the decision variable: .. math:: D^2 = \sum_{i=1}^2 \sum_{j=1}^m \frac{( n_{ij} - \frac{n_{i.} n_{.j}}{n} )^2}{\frac{n_{i.} n_{.j}}{n}} which tends towards the :math:`\chi^2(m-1)` distribution. The hypothesis of independence is rejected if :math:`D^2` is too high (depending on the p-value threshold). - the Pearson test: it tests if there exists a linear relation between two scalar samples which form a gaussian vector (which is equivalent to have a linear correlation coefficient not equal to zero). If both samples are :math:`\underline{x} = (x_i)_{1 \leq i \leq n}` and :math:`\underline{y} = (y_i)_{1 \leq i \leq n}`, and :math:`\bar{x} = \displaystyle \frac{1}{n}\sum_{i=1}^n x_i` and :math:`\bar{y} = \displaystyle \frac{1}{n}\sum_{i=1}^n y_i`, the Pearson test evaluates the decision variable: .. math:: D = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2\sum_{i=1}^n (y_i - \bar{y})^2}} The variable :math:`D` tends towards a :math:`\chi^2(n-2)`, under the hypothesis of normality of both samples. The hypothesis of a linear coefficient equal to 0 is rejected (which is equivalent to the independence of the samples) if D is too high (depending on the p-value threshold). - the Spearman test: it tests if there exists a monotonous relation between two scalar samples. If both samples are :math:`\underline{x} = (x_i)_{1 \leq i \leq n}` and :math:`\underline{y}= (y_i)_{1 \leq i \leq n}`,, the Spearman test evaluates the decision variable: .. math:: D = 1-\frac{6\sum_{i=1}^n (r_i - s_i)^2}{n(n^2-1)} where :math:`r_i = rank(x_i)` and :math:`s_i = rank(y_i)`. :math:`D` is such that :math:`\sqrt{n-1}D` tends towards the standard normal distribution. .. GENERATED FROM PYTHON SOURCE LINES 61-65 The continuous case ^^^^^^^^^^^^^^^^^^^ We create two different continuous samples : .. GENERATED FROM PYTHON SOURCE LINES 65-68 .. code-block:: Python sample1 = ot.Normal().getSample(100) sample2 = ot.Normal().getSample(100) .. GENERATED FROM PYTHON SOURCE LINES 69-70 We first use the Pearson test and store the result : .. GENERATED FROM PYTHON SOURCE LINES 70-72 .. code-block:: Python resultPearson = ot.HypothesisTest.Pearson(sample1, sample2, 0.10) .. GENERATED FROM PYTHON SOURCE LINES 73-76 We can then display the result of the test as a yes/no answer with the `getBinaryQualityMeasure`. We can retrieve the p-value and the threshold with the `getPValue` and `getThreshold` methods. .. GENERATED FROM PYTHON SOURCE LINES 76-84 .. code-block:: Python print( "Component is normal?", resultPearson.getBinaryQualityMeasure(), "p-value=%.6g" % resultPearson.getPValue(), "threshold=%.6g" % resultPearson.getThreshold(), ) .. rst-class:: sphx-glr-script-out .. code-block:: none Component is normal? False p-value=0.0210046 threshold=0.1 .. GENERATED FROM PYTHON SOURCE LINES 85-86 We can also use the Spearman test : .. GENERATED FROM PYTHON SOURCE LINES 86-95 .. code-block:: Python resultSpearman = ot.HypothesisTest.Spearman(sample1, sample2, 0.10) print( "Component is normal?", resultSpearman.getBinaryQualityMeasure(), "p-value=%.6g" % resultSpearman.getPValue(), "threshold=%.6g" % resultSpearman.getThreshold(), ) .. rst-class:: sphx-glr-script-out .. code-block:: none Component is normal? False p-value=0.0338608 threshold=0.1 .. GENERATED FROM PYTHON SOURCE LINES 96-100 The discrete case ^^^^^^^^^^^^^^^^^ Testing is also possible for discrete distribution. Let us create discrete two different samples : .. GENERATED FROM PYTHON SOURCE LINES 100-103 .. code-block:: Python sample1 = ot.Poisson(0.2).getSample(100) sample2 = ot.Poisson(0.2).getSample(100) .. GENERATED FROM PYTHON SOURCE LINES 104-105 We use the Chi2 test to check independence and store the result : .. GENERATED FROM PYTHON SOURCE LINES 105-107 .. code-block:: Python resultChi2 = ot.HypothesisTest.ChiSquared(sample1, sample2, 0.10) .. GENERATED FROM PYTHON SOURCE LINES 108-109 and display the results : .. GENERATED FROM PYTHON SOURCE LINES 109-117 .. code-block:: Python print( "Component is normal?", resultChi2.getBinaryQualityMeasure(), "p-value=%.6g" % resultChi2.getPValue(), "threshold=%.6g" % resultChi2.getThreshold(), ) .. rst-class:: sphx-glr-script-out .. code-block:: none Component is normal? True p-value=0.63604 threshold=0.1 .. GENERATED FROM PYTHON SOURCE LINES 118-124 Test samples independence using regression ------------------------------------------ Independence testing with regression is also an option in OpenTURNS. It consists in detecting a linear relation between two scalar samples. .. GENERATED FROM PYTHON SOURCE LINES 126-127 We generate a sample of dimension 3 with component 0 correlated to component 2 : .. GENERATED FROM PYTHON SOURCE LINES 127-134 .. code-block:: Python marginals = [ot.Normal()] * 3 S = ot.CorrelationMatrix(3) S[0, 2] = 0.9 copula = ot.NormalCopula(S) distribution = ot.ComposedDistribution(marginals, copula) sample = distribution.getSample(30) .. GENERATED FROM PYTHON SOURCE LINES 135-136 Next, we split it in two samples : firstSample of dimension=2, secondSample of dimension=1. .. GENERATED FROM PYTHON SOURCE LINES 136-139 .. code-block:: Python firstSample = sample[:, :2] secondSample = sample[:, 2] .. GENERATED FROM PYTHON SOURCE LINES 140-141 We test independence of each component of firstSample against the secondSample : .. GENERATED FROM PYTHON SOURCE LINES 141-151 .. code-block:: Python test_results = ot.LinearModelTest.FullRegression(firstSample, secondSample) for i in range(len(test_results)): print( "Component", i, "is independent?", test_results[i].getBinaryQualityMeasure(), "p-value=%.6g" % test_results[i].getPValue(), "threshold=%.6g" % test_results[i].getThreshold(), ) .. rst-class:: sphx-glr-script-out .. code-block:: none Component 0 is independent? True p-value=0.853438 threshold=0.05 Component 1 is independent? False p-value=1.19352e-11 threshold=0.05 Component 2 is independent? True p-value=0.722678 threshold=0.05 .. _sphx_glr_download_auto_data_analysis_statistical_tests_plot_test_independence.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_test_independence.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_test_independence.py `