Note
Go to the end to download the full example code.
Test independence¶
import openturns as ot
Sample independence test¶
In this paragraph we perform tests to assess whether two 1-d samples are independent or not.
The following tests are available:
the ChiSquared test: it tests if both scalar samples (discrete ones only) are independent. If
is the number of values of the sample
in the modality
,
,
, and the ChiSquared test evaluates the decision variable:
which tends towards the distribution.
The hypothesis of independence is rejected if
is too high
(depending on the p-value threshold).
the Pearson test: it tests if there exists a linear relation between two scalar samples which form a Gaussian vector (which is equivalent to have a linear correlation coefficient not equal to zero). If both samples are
and
, and
and
, the Pearson test evaluates the decision variable:
The variable tends towards a
, under the
hypothesis of normality of both samples.
The hypothesis of a linear coefficient equal to 0 is rejected (which is
equivalent to the independence of the samples) if
is too high
(depending on the p-value threshold).
the Spearman test: it tests if there exists a monotonous relation between two scalar samples. If both samples are
and
,, the Spearman test evaluates the decision variable:
where and
.
is
such that
tends towards the standard Normal distribution.
The continuous case¶
We create two different continuous samples :
sample1 = ot.Normal().getSample(100)
sample2 = ot.Normal().getSample(100)
We first use the Pearson test and store the result :
resultPearson = ot.HypothesisTest.Pearson(sample1, sample2, 0.10)
We can then display the result of the test as a yes/no answer with the getBinaryQualityMeasure. We can retrieve the p-value and the threshold with the getPValue and getThreshold methods.
print(
"Component is normal?",
resultPearson.getBinaryQualityMeasure(),
"p-value=%.6g" % resultPearson.getPValue(),
"threshold=%.6g" % resultPearson.getThreshold(),
)
Component is normal? True p-value=0.360164 threshold=0.1
We can also use the Spearman test :
resultSpearman = ot.HypothesisTest.Spearman(sample1, sample2, 0.10)
print(
"Component is normal?",
resultSpearman.getBinaryQualityMeasure(),
"p-value=%.6g" % resultSpearman.getPValue(),
"threshold=%.6g" % resultSpearman.getThreshold(),
)
Component is normal? True p-value=0.193143 threshold=0.1
The discrete case¶
Testing is also possible for discrete distribution. Let us create discrete two different samples :
sample1 = ot.Poisson(0.2).getSample(100)
sample2 = ot.Poisson(0.2).getSample(100)
We use the Chi2 test to check independence and store the result :
resultChi2 = ot.HypothesisTest.ChiSquared(sample1, sample2, 0.10)
and display the results :
print(
"Component is normal?",
resultChi2.getBinaryQualityMeasure(),
"p-value=%.6g" % resultChi2.getPValue(),
"threshold=%.6g" % resultChi2.getThreshold(),
)
Component is normal? True p-value=0.790072 threshold=0.1
Test samples independence using regression¶
Independence testing with regression is also an option. It consists in detecting a linear relation between two scalar samples.
We generate a sample of dimension 3 with component 0 correlated to component 2 :
marginals = [ot.Normal()] * 3
S = ot.CorrelationMatrix(3)
S[0, 2] = 0.9
copula = ot.NormalCopula(S)
distribution = ot.JointDistribution(marginals, copula)
sample = distribution.getSample(30)
Next, we split it in two samples : firstSample of dimension=2, secondSample of dimension=1.
firstSample = sample[:, :2]
secondSample = sample[:, 2]
We test independence of each component of firstSample against secondSample :
test_results = ot.LinearModelTest.FullRegression(firstSample, secondSample)
for i in range(len(test_results)):
print(
"Component",
i,
"is independent?",
test_results[i].getBinaryQualityMeasure(),
"p-value=%.6g" % test_results[i].getPValue(),
"threshold=%.6g" % test_results[i].getThreshold(),
)
Component 0 is independent? True p-value=0.802428 threshold=0.05
Component 1 is independent? False p-value=3.47103e-12 threshold=0.05
Component 2 is independent? True p-value=0.931724 threshold=0.05