.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_data_analysis/statistical_tests/plot_test_independence.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_data_analysis_statistical_tests_plot_test_independence.py: Test independence ================= .. GENERATED FROM PYTHON SOURCE LINES 6-12 .. code-block:: default import openturns as ot import openturns.viewer as viewer from matplotlib import pylab as plt ot.Log.Show(ot.Log.NONE) .. GENERATED FROM PYTHON SOURCE LINES 13-44 Sample independence test ------------------------ In this paragraph we perform tests to assess whether two 1-d samples are independent or not. The following tests are available : - the ChiSquared test: it tests if both scalar samples (discrete ones only) are independent. If :math:`n_{ij}` is the number of values of the sample :math:`i=(1,2)` in the modality :math:`1 \leq j \leq m`, :math:`\displaystyle n_{i.} = \sum_{j=1}^m n_{ij}` :math:`\displaystyle n_{.j} = \sum_{i=1}^2 n_{ij}`, and the ChiSquared test evaluates the decision variable: .. math:: D^2 = \sum_{i=1}^2 \sum_{j=1}^m \frac{( n_{ij} - \frac{n_{i.} n_{.j}}{n} )^2}{\frac{n_{i.} n_{.j}}{n}} which tends towards the :math:`\chi^2(m-1)` distribution. The hypothesis of independence is rejected if :math:`D^2` is too high (depending on the p-value threshold). - the Pearson test: it tests if there exists a linear relation between two scalar samples which form a gaussian vector (which is equivalent to have a linear correlation coefficient not equal to zero). If both samples are :math:`\underline{x} = (x_i)_{1 \leq i \leq n}` and :math:`\underline{y} = (y_i)_{1 \leq i \leq n}`, and :math:`\bar{x} = \displaystyle \frac{1}{n}\sum_{i=1}^n x_i` and :math:`\bar{y} = \displaystyle \frac{1}{n}\sum_{i=1}^n y_i`, the Pearson test evaluates the decision variable: .. math:: D = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2\sum_{i=1}^n (y_i - \bar{y})^2}} The variable :math:`D` tends towards a :math:`\chi^2(n-2)`, under the hypothesis of normality of both samples. The hypothesis of a linear coefficient equal to 0 is rejected (which is equivalent to the independence of the samples) if D is too high (depending on the p-value threshold). - the Spearman test: it tests if there exists a monotonous relation between two scalar samples. If both samples are :math:`\underline{x} = (x_i)_{1 \leq i \leq n}` and :math:`\underline{y}= (y_i)_{1 \leq i \leq n}`,, the Spearman test evaluates the decision variable: .. math:: D = 1-\frac{6\sum_{i=1}^n (r_i - s_i)^2}{n(n^2-1)} where :math:`r_i = rank(x_i)` and :math:`s_i = rank(y_i)`. :math:`D` is such that :math:`\sqrt{n-1}D` tends towards the standard normal distribution. .. GENERATED FROM PYTHON SOURCE LINES 46-50 The continuous case ^^^^^^^^^^^^^^^^^^^ We create two different continuous samples : .. GENERATED FROM PYTHON SOURCE LINES 50-53 .. code-block:: default sample1 = ot.Normal().getSample(100) sample2 = ot.Normal().getSample(100) .. GENERATED FROM PYTHON SOURCE LINES 54-55 We first use the Pearson test and store the result : .. GENERATED FROM PYTHON SOURCE LINES 55-57 .. code-block:: default resultPearson = ot.HypothesisTest.Pearson(sample1, sample2, 0.10) .. GENERATED FROM PYTHON SOURCE LINES 58-61 We can then display the result of the test as a yes/no answer with the `getBinaryQualityMeasure`. We can retrieve the p-value and the threshold with the `getPValue` and `getThreshold` methods. .. GENERATED FROM PYTHON SOURCE LINES 61-66 .. code-block:: default print('Component is normal?', resultPearson.getBinaryQualityMeasure(), 'p-value=%.6g' % resultPearson.getPValue(), 'threshold=%.6g' % resultPearson.getThreshold()) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Component is normal? False p-value=0.0451584 threshold=0.1 .. GENERATED FROM PYTHON SOURCE LINES 67-68 We can also use the Spearman test : .. GENERATED FROM PYTHON SOURCE LINES 68-74 .. code-block:: default resultSpearman = ot.HypothesisTest.Spearman(sample1, sample2, 0.10) print('Component is normal?', resultSpearman.getBinaryQualityMeasure(), 'p-value=%.6g' % resultSpearman.getPValue(), 'threshold=%.6g' % resultSpearman.getThreshold()) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Component is normal? False p-value=0.0603411 threshold=0.1 .. GENERATED FROM PYTHON SOURCE LINES 75-79 The discrete case ^^^^^^^^^^^^^^^^^ Testing is also possible for discrete distribution. Let us create discrete two different samples : .. GENERATED FROM PYTHON SOURCE LINES 79-82 .. code-block:: default sample1 = ot.Poisson(0.2).getSample(100) sample2 = ot.Poisson(0.2).getSample(100) .. GENERATED FROM PYTHON SOURCE LINES 83-84 We use the Chi2 test to check independence and store the result : .. GENERATED FROM PYTHON SOURCE LINES 84-86 .. code-block:: default resultChi2 = ot.HypothesisTest.ChiSquared(sample1, sample2, 0.10) .. GENERATED FROM PYTHON SOURCE LINES 87-88 and display the results : .. GENERATED FROM PYTHON SOURCE LINES 88-93 .. code-block:: default print('Component is normal?', resultChi2.getBinaryQualityMeasure(), 'p-value=%.6g' % resultChi2.getPValue(), 'threshold=%.6g' % resultChi2.getThreshold()) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Component is normal? True p-value=0.20552 threshold=0.1 .. GENERATED FROM PYTHON SOURCE LINES 94-100 Test samples independence using regression ------------------------------------------ Independence testing with regression is also an option in OpenTURNS. It consists in detecting a linear relation between two scalar samples. .. GENERATED FROM PYTHON SOURCE LINES 102-103 We generate a sample of dimension 3 with component 0 correlated to component 2 : .. GENERATED FROM PYTHON SOURCE LINES 103-110 .. code-block:: default marginals = [ot.Normal()] * 3 S = ot.CorrelationMatrix(3) S[0, 2] = 0.9 copula = ot.NormalCopula(S) distribution = ot.ComposedDistribution(marginals, copula) sample = distribution.getSample(30) .. GENERATED FROM PYTHON SOURCE LINES 111-112 Next, we split it in two samples : firstSample of dimension=2, secondSample of dimension=1. .. GENERATED FROM PYTHON SOURCE LINES 112-115 .. code-block:: default firstSample = sample[:, :2] secondSample = sample[:, 2] .. GENERATED FROM PYTHON SOURCE LINES 116-117 We test independence of each component of firstSample against the secondSample : .. GENERATED FROM PYTHON SOURCE LINES 117-122 .. code-block:: default test_results = ot.LinearModelTest.FullRegression(firstSample, secondSample) for i in range(len(test_results)): print('Component', i, 'is independent?', test_results[i].getBinaryQualityMeasure(), 'p-value=%.6g' % test_results[i].getPValue(), 'threshold=%.6g' % test_results[i].getThreshold()) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Component 0 is independent? True p-value=0.646138 threshold=0.05 Component 1 is independent? False p-value=1.30057e-10 threshold=0.05 Component 2 is independent? True p-value=0.342379 threshold=0.05 .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.006 seconds) .. _sphx_glr_download_auto_data_analysis_statistical_tests_plot_test_independence.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_test_independence.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_test_independence.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_