Note

Go to the end to download the full example code.

Test independence¶

import openturns as ot

ot.Log.Show(ot.Log.NONE)

Sample independence test¶

In this paragraph we perform tests to assess whether two 1-d samples are independent or not.

The following tests are available:

the ChiSquared test: it tests if both scalar samples (discrete ones only) are independent. If $n_{ij}$ is the number of values of the sample $i=(1,2)$ in the modality $1 \leq j \leq m$ , $\displaystyle n_{i.} = \sum_{j=1}^m n_{ij}$ , $\displaystyle n_{.j} = \sum_{i=1}^2 n_{ij}$ , and the ChiSquared test evaluates the decision variable:

$D^2 = \sum_{i=1}^2 \sum_{j=1}^m \frac{( n_{ij} - \frac{n_{i.} n_{.j}}{n} )^2}{\frac{n_{i.} n_{.j}}{n}}$

which tends towards the $\chi^2(m-1)$ distribution. The hypothesis of independence is rejected if $D^2$ is too high (depending on the p-value threshold).

the Pearson test: it tests if there exists a linear relation between two scalar samples which form a Gaussian vector (which is equivalent to have a linear correlation coefficient not equal to zero). If both samples are $\underline{x} = (x_i)_{1 \leq i \leq n}$ and $\underline{y} = (y_i)_{1 \leq i \leq n}$ , and $\bar{x} = \displaystyle \frac{1}{n}\sum_{i=1}^n x_i$ and $\bar{y} = \displaystyle \frac{1}{n}\sum_{i=1}^n y_i$ , the Pearson test evaluates the decision variable:

$D = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2\sum_{i=1}^n (y_i - \bar{y})^2}}$

The variable $D$ tends towards a $\chi^2(n-2)$ , under the hypothesis of normality of both samples. The hypothesis of a linear coefficient equal to 0 is rejected (which is equivalent to the independence of the samples) if $D$ is too high (depending on the p-value threshold).

the Spearman test: it tests if there exists a monotonous relation between two scalar samples. If both samples are $\underline{x} = (x_i)_{1 \leq i \leq n}$ and $\underline{y}= (y_i)_{1 \leq i \leq n}$ ,, the Spearman test evaluates the decision variable:

$D = 1-\frac{6\sum_{i=1}^n (r_i - s_i)^2}{n(n^2-1)}$

where $r_i = rank(x_i)$ and $s_i = rank(y_i)$ . $D$ is such that $\sqrt{n-1}D$ tends towards the standard Normal distribution.

The continuous case¶

We create two different continuous samples :

sample1 = ot.Normal().getSample(100)
sample2 = ot.Normal().getSample(100)

We first use the Pearson test and store the result :

resultPearson = ot.HypothesisTest.Pearson(sample1, sample2, 0.10)

We can then display the result of the test as a yes/no answer with the getBinaryQualityMeasure. We can retrieve the p-value and the threshold with the getPValue and getThreshold methods.

print(
    "Component is normal?",
    resultPearson.getBinaryQualityMeasure(),
    "p-value=%.6g" % resultPearson.getPValue(),
    "threshold=%.6g" % resultPearson.getThreshold(),
)

Component is normal? False p-value=0.0284099 threshold=0.1

We can also use the Spearman test :

resultSpearman = ot.HypothesisTest.Spearman(sample1, sample2, 0.10)
print(
    "Component is normal?",
    resultSpearman.getBinaryQualityMeasure(),
    "p-value=%.6g" % resultSpearman.getPValue(),
    "threshold=%.6g" % resultSpearman.getThreshold(),
)

Component is normal? False p-value=0.026346 threshold=0.1

The discrete case¶

Testing is also possible for discrete distribution. Let us create discrete two different samples :

sample1 = ot.Poisson(0.2).getSample(100)
sample2 = ot.Poisson(0.2).getSample(100)

We use the Chi2 test to check independence and store the result :

resultChi2 = ot.HypothesisTest.ChiSquared(sample1, sample2, 0.10)

and display the results :

print(
    "Component is normal?",
    resultChi2.getBinaryQualityMeasure(),
    "p-value=%.6g" % resultChi2.getPValue(),
    "threshold=%.6g" % resultChi2.getThreshold(),
)

Component is normal? False p-value=0.050043 threshold=0.1

Test samples independence using regression¶

Independence testing with regression is also an option. It consists in detecting a linear relation between two scalar samples.

We generate a sample of dimension 3 with component 0 correlated to component 2 :

marginals = [ot.Normal()] * 3
S = ot.CorrelationMatrix(3)
S[0, 2] = 0.9
copula = ot.NormalCopula(S)
distribution = ot.JointDistribution(marginals, copula)
sample = distribution.getSample(30)

Next, we split it in two samples : firstSample of dimension=2, secondSample of dimension=1.

firstSample = sample[:, :2]
secondSample = sample[:, 2]

We test independence of each component of firstSample against secondSample :

test_results = ot.LinearModelTest.FullRegression(firstSample, secondSample)
for i in range(len(test_results)):
    print(
        "Component",
        i,
        "is independent?",
        test_results[i].getBinaryQualityMeasure(),
        "p-value=%.6g" % test_results[i].getPValue(),
        "threshold=%.6g" % test_results[i].getThreshold(),
    )

Component 0 is independent? True p-value=0.256101 threshold=0.05
Component 1 is independent? False p-value=1.19964e-12 threshold=0.05
Component 2 is independent? True p-value=0.606441 threshold=0.05

OpenTURNS

An Open source initiative for the Treatment of Uncertainties, Risks'N Statistics

Table of Contents

Previous topic

Next topic

This Page