Test independence

import openturns as ot

Sample independence test

In this paragraph we perform tests to assess whether two 1-d samples generated by two random variables X and Y are independent or not.

The following tests are available:

  • the ChiSquared test only used for discrete variables. Refer to Chi-squared test for independence for more details.

  • the Pearson test: this test checks if there exists a linear relationship between X and Y. It is equivalent to an independence test only if the random vector (X,Y) is a Gaussian vector. Refer to Pearson correlation test for more details.

  • the Spearman test: this test checks if there exists a monotonic relationship between X and Y. Refer to Spearman correlation test for more details.

  • independence test using regression: this test checks if there exists a linear relation between X and Y using a linear model.

Case 1: Pearson and Spearman tests

We create a sample generated by a bivariate Gaussian vector (X,Y) with independent components.

sample_Biv = ot.Normal(2).getSample(1000)
sample1 = sample_Biv.getMarginal(0)
sample2 = sample_Biv.getMarginal(1)

To test the independence between both samples, we first use the Pearson test with the Type I error equal to 0.1 (which is the probability to wrongly rejects the null hypothesis). The Pearson test checks if there is a linear correlation between both random variables. The null hypothesis is: There is no linear relation. As (X,Y) is a Gaussian vector, it is equivalent to test the independence of the components.

resultPearson = ot.HypothesisTest.Pearson(sample1, sample2, 0.10)

We can then display the result of the test as a yes/no answer with the getBinaryQualityMeasure. We can retrieve the p-value and the threshold with the getPValue and getThreshold methods.

print(
    "Is the Pearson correlation coefficient is null ?",
    resultPearson.getBinaryQualityMeasure(),
    "p-value=%.6g" % resultPearson.getPValue(),
    "threshold=%.6g" % resultPearson.getThreshold(),
)
Is the Pearson correlation coefficient is null ? True p-value=0.748637 threshold=0.1

Conclusion: The Pearson test validates that there is no linear correlation between both samples: the null hypothesis assuming that the Pearson correlation coefficient is null is accepted. It means that the components are independent. In the general case, the Gaussian vector hypothesis must be validated!

We can also use the Spearman test with the Type I error equal to 0.1 (which is the probability to wrongly rejects the null hypothesis). The Spearman test checks if there exists a monotonic relationship between X and Y. The null hypothesis is: There is no monotonic relation.

resultSpearman = ot.HypothesisTest.Spearman(sample1, sample2, 0.10)
print(
    "Is the Spearman correlation coefficient is null ?",
    resultSpearman.getBinaryQualityMeasure(),
    "p-value=%.6g" % resultSpearman.getPValue(),
    "threshold=%.6g" % resultSpearman.getThreshold(),
)
Is the Spearman correlation coefficient is null ? True p-value=0.839209 threshold=0.1

Conclusion: The Spearman test validates that there is no monotonic correlation between both samples: the null hypothesis assuming that the Spearman correlation coefficient is null is accepted.

Here, we create a bivariate sample from a Gaussian vector which components are correlated. We note that the Pearson test and the Spearman test both detect a correlation as both null hypotheses are rejected.

cor_Matrix = ot.CorrelationMatrix(2)
cor_Matrix[0, 1] = 0.8
sample_Biv = ot.Normal([0] * 2, [1] * 2, cor_Matrix).getSample(1000)
sample1 = sample_Biv.getMarginal(0)
sample2 = sample_Biv.getMarginal(1)
resultPearson = ot.HypothesisTest.Pearson(sample1, sample2, 0.10)
resultSpearman = ot.HypothesisTest.Spearman(sample1, sample2, 0.10)
print('Pearson test : ', resultPearson)
print('Spearman test : ', resultSpearman)
Pearson test :  class=TestResult name=Unnamed type=Pearson binaryQualityMeasure=false p-value threshold=0.1 p-value=1.993e-212 statistic=40.4302 description=[]
Spearman test :  class=TestResult name=Unnamed type=Spearman binaryQualityMeasure=false p-value threshold=0.1 p-value=0 statistic=38.3542 description=[]

We consider now a discrete distribution. Let us create two independent samples.

sample1 = ot.Poisson(0.2).getSample(100)
sample2 = ot.Poisson(0.2).getSample(100)

We use the Chi2 test to check independence.

resultChi2 = ot.HypothesisTest.ChiSquared(sample1, sample2, 0.10)

We display the results.

print(
    "Are the components independent?",
    resultChi2.getBinaryQualityMeasure(),
    "p-value=%.6g" % resultChi2.getPValue(),
    "threshold=%.6g" % resultChi2.getThreshold(),
)
Are the components independent? True p-value=0.531971 threshold=0.1

Conclusion: The Chi2 test validates that both samples are independent: the null hypothesis assuming the independence is accepted.

Case 2: Independence test using regression

This test consists in fitting a linear model between X and Y and anylysing if the coefficients are significantly different from 0.

We create a sample generated by a Gaussian vector (X_1, X_2, X_3) with zero mean, unit variance and which components (X_1, X_3) are correlated.

corr_Matrix = ot.CorrelationMatrix(3)
corr_Matrix[0, 2] = 0.9
distribution = ot.Normal([0] * 3, [1] * 3, corr_Matrix)
sample = distribution.getSample(100)

Next, we split the sample in two samples : the first one is associated to (X_1, X_2) and the second one is associated to X_3.

first_Sample = sample.getMarginal([0, 1])
second_Sample = sample.getMarginal(2)

We fit a linear model of X_3 with respect to (X_1, X_2): X_3 = a_0 + a_1X_1 + a_2X_2. Then, we test if each coefficient a_k is significantly different from 0. The null hypothesis is The coefficient of the linear model is equal to zero. When the result is True, the null hypothesis is accepted, which means that there is no dependence between the components. When the result is False, the null hypothesis is rejected, which means that there is a linear relationship between the components.

test_results = ot.LinearModelTest.FullRegression(first_Sample, second_Sample)
for i in range(len(test_results)):
    print(
        "Coefficient a" + str(i) + " is equal to 0?",
        test_results[i].getBinaryQualityMeasure(),
        "p-value=%.6g" % test_results[i].getPValue(),
        "threshold=%.6g" % test_results[i].getThreshold(),
    )
Coefficient a0 is equal to 0? True p-value=0.951597 threshold=0.05
Coefficient a1 is equal to 0? False p-value=6.20875e-41 threshold=0.05
Coefficient a2 is equal to 0? True p-value=0.113865 threshold=0.05

Conclusion: The test detects the independence between X_1 and X_3 and the correlation between X_2 and X_3. It also detects that a_0 is null.