.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_data_analysis/statistical_tests/plot_test_independence.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_data_analysis_statistical_tests_plot_test_independence.py: Test independence ================= .. GENERATED FROM PYTHON SOURCE LINES 7-9 .. code-block:: Python import openturns as ot .. GENERATED FROM PYTHON SOURCE LINES 10-32 Sample independence test ------------------------ In this paragraph we perform tests to assess whether two 1-d samples generated by two random variables :math:`X` and :math:`Y` are independent or not. The following tests are available: - the ChiSquared test only used for discrete variables. Refer to :ref:`chi2_independence_test` for more details. - the Pearson test: this test checks if there exists a linear relationship between :math:`X` and :math:`Y`. It is equivalent to an independence test only if the random vector :math:`(X,Y)` is a Gaussian vector. Refer to :ref:`pearson_test` for more details. - the Spearman test: this test checks if there exists a monotonic relationship between :math:`X` and :math:`Y`. Refer to :ref:`spearman_test` for more details. - independence test using regression: this test checks if there exists a linear relation between :math:`X` and :math:`Y` using a linear model. .. GENERATED FROM PYTHON SOURCE LINES 34-38 Case 1: Pearson and Spearman tests ---------------------------------- We create a sample generated by a bivariate Gaussian vector :math:`(X,Y)` with independent components. .. GENERATED FROM PYTHON SOURCE LINES 38-42 .. code-block:: Python sample_Biv = ot.Normal(2).getSample(1000) sample1 = sample_Biv.getMarginal(0) sample2 = sample_Biv.getMarginal(1) .. GENERATED FROM PYTHON SOURCE LINES 43-48 To test the independence between both samples, we first use the Pearson test with the Type I error equal to 0.1 (which is the probability to wrongly rejects the null hypothesis). The Pearson test checks if there is a linear correlation between both random variables. The null hypothesis is: *There is no linear relation*. As :math:`(X,Y)` is a Gaussian vector, it is equivalent to test the independence of the components. .. GENERATED FROM PYTHON SOURCE LINES 48-50 .. code-block:: Python resultPearson = ot.HypothesisTest.Pearson(sample1, sample2, 0.10) .. GENERATED FROM PYTHON SOURCE LINES 51-54 We can then display the result of the test as a yes/no answer with the `getBinaryQualityMeasure`. We can retrieve the p-value and the threshold with the `getPValue` and `getThreshold` methods. .. GENERATED FROM PYTHON SOURCE LINES 54-61 .. code-block:: Python print( "Is the Pearson correlation coefficient is null ?", resultPearson.getBinaryQualityMeasure(), "p-value=%.6g" % resultPearson.getPValue(), "threshold=%.6g" % resultPearson.getThreshold(), ) .. rst-class:: sphx-glr-script-out .. code-block:: none Is the Pearson correlation coefficient is null ? True p-value=0.748637 threshold=0.1 .. GENERATED FROM PYTHON SOURCE LINES 62-66 **Conclusion**: The Pearson test validates that there is no linear correlation between both samples: the null hypothesis assuming that the Pearson correlation coefficient is null is accepted. It means that the components are independent. In the general case, the Gaussian vector hypothesis must be validated! .. GENERATED FROM PYTHON SOURCE LINES 68-72 We can also use the Spearman test with the Type I error equal to 0.1 (which is the probability to wrongly rejects the null hypothesis). The Spearman test checks if there exists a monotonic relationship between :math:`X` and :math:`Y`. The null hypothesis is: *There is no monotonic relation*. .. GENERATED FROM PYTHON SOURCE LINES 72-80 .. code-block:: Python resultSpearman = ot.HypothesisTest.Spearman(sample1, sample2, 0.10) print( "Is the Spearman correlation coefficient is null ?", resultSpearman.getBinaryQualityMeasure(), "p-value=%.6g" % resultSpearman.getPValue(), "threshold=%.6g" % resultSpearman.getThreshold(), ) .. rst-class:: sphx-glr-script-out .. code-block:: none Is the Spearman correlation coefficient is null ? True p-value=0.839209 threshold=0.1 .. GENERATED FROM PYTHON SOURCE LINES 81-83 **Conclusion**: The Spearman test validates that there is no monotonic correlation between both samples: the null hypothesis assuming that the Spearman correlation coefficient is null is accepted. .. GENERATED FROM PYTHON SOURCE LINES 85-88 Here, we create a bivariate sample from a Gaussian vector which components are correlated. We note that the Pearson test and the Spearman test both detect a correlation as both null hypotheses are rejected. .. GENERATED FROM PYTHON SOURCE LINES 88-98 .. code-block:: Python cor_Matrix = ot.CorrelationMatrix(2) cor_Matrix[0, 1] = 0.8 sample_Biv = ot.Normal([0] * 2, [1] * 2, cor_Matrix).getSample(1000) sample1 = sample_Biv.getMarginal(0) sample2 = sample_Biv.getMarginal(1) resultPearson = ot.HypothesisTest.Pearson(sample1, sample2, 0.10) resultSpearman = ot.HypothesisTest.Spearman(sample1, sample2, 0.10) print('Pearson test : ', resultPearson) print('Spearman test : ', resultSpearman) .. rst-class:: sphx-glr-script-out .. code-block:: none Pearson test : class=TestResult name=Unnamed type=Pearson binaryQualityMeasure=false p-value threshold=0.1 p-value=1.993e-212 statistic=40.4302 description=[] Spearman test : class=TestResult name=Unnamed type=Spearman binaryQualityMeasure=false p-value threshold=0.1 p-value=0 statistic=38.3542 description=[] .. GENERATED FROM PYTHON SOURCE LINES 99-100 We consider now a discrete distribution. Let us create two independent samples. .. GENERATED FROM PYTHON SOURCE LINES 100-103 .. code-block:: Python sample1 = ot.Poisson(0.2).getSample(100) sample2 = ot.Poisson(0.2).getSample(100) .. GENERATED FROM PYTHON SOURCE LINES 104-105 We use the Chi2 test to check independence. .. GENERATED FROM PYTHON SOURCE LINES 105-107 .. code-block:: Python resultChi2 = ot.HypothesisTest.ChiSquared(sample1, sample2, 0.10) .. GENERATED FROM PYTHON SOURCE LINES 108-109 We display the results. .. GENERATED FROM PYTHON SOURCE LINES 109-116 .. code-block:: Python print( "Are the components independent?", resultChi2.getBinaryQualityMeasure(), "p-value=%.6g" % resultChi2.getPValue(), "threshold=%.6g" % resultChi2.getThreshold(), ) .. rst-class:: sphx-glr-script-out .. code-block:: none Are the components independent? True p-value=0.531971 threshold=0.1 .. GENERATED FROM PYTHON SOURCE LINES 117-119 **Conclusion**: The Chi2 test validates that both samples are independent: the null hypothesis assuming the independence is accepted. .. GENERATED FROM PYTHON SOURCE LINES 122-127 Case 2: Independence test using regression ------------------------------------------ This test consists in fitting a linear model between :math:`X` and :math:`Y` and anylysing if the coefficients are significantly different from 0. .. GENERATED FROM PYTHON SOURCE LINES 129-131 We create a sample generated by a Gaussian vector :math:`(X_1, X_2, X_3)` with zero mean, unit variance and which components :math:`(X_1, X_3)` are correlated. .. GENERATED FROM PYTHON SOURCE LINES 131-136 .. code-block:: Python corr_Matrix = ot.CorrelationMatrix(3) corr_Matrix[0, 2] = 0.9 distribution = ot.Normal([0] * 3, [1] * 3, corr_Matrix) sample = distribution.getSample(100) .. GENERATED FROM PYTHON SOURCE LINES 137-139 Next, we split the sample in two samples : the first one is associated to :math:`(X_1, X_2)` and the second one is associated to :math:`X_3`. .. GENERATED FROM PYTHON SOURCE LINES 139-142 .. code-block:: Python first_Sample = sample.getMarginal([0, 1]) second_Sample = sample.getMarginal(2) .. GENERATED FROM PYTHON SOURCE LINES 143-150 We fit a linear model of :math:`X_3` with respect to :math:`(X_1, X_2)`: :math:`X_3 = a_0 + a_1X_1 + a_2X_2`. Then, we test if each coefficient :math:`a_k` is significantly different from 0. The null hypothesis is *The coefficient of the linear model is equal to zero*. When the result is *True*, the null hypothesis is accepted, which means that there is no dependence between the components. When the result is *False*, the null hypothesis is rejected, which means that there is a linear relationship between the components. .. GENERATED FROM PYTHON SOURCE LINES 150-159 .. code-block:: Python test_results = ot.LinearModelTest.FullRegression(first_Sample, second_Sample) for i in range(len(test_results)): print( "Coefficient a" + str(i) + " is equal to 0?", test_results[i].getBinaryQualityMeasure(), "p-value=%.6g" % test_results[i].getPValue(), "threshold=%.6g" % test_results[i].getThreshold(), ) .. rst-class:: sphx-glr-script-out .. code-block:: none Coefficient a0 is equal to 0? True p-value=0.951597 threshold=0.05 Coefficient a1 is equal to 0? False p-value=6.20875e-41 threshold=0.05 Coefficient a2 is equal to 0? True p-value=0.113865 threshold=0.05 .. GENERATED FROM PYTHON SOURCE LINES 160-162 **Conclusion**: The test detects the independence between :math:`X_1` and :math:`X_3` and the correlation between :math:`X_2` and :math:`X_3`. It also detects that :math:`a_0` is null. .. _sphx_glr_download_auto_data_analysis_statistical_tests_plot_test_independence.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_test_independence.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_test_independence.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_test_independence.zip `