Sample independence testΒΆ

In this example we are going to perform tests to assess whether two 1-d samples are independent or not.

The following tests are available:

  • the ChiSquared test: it tests if both scalar samples (discrete ones only) are independent. If n_{ij} is the number of values of the sample i=(1,2) in the modality 1 \leq j \leq m, \displaystyle n_{i.} = \sum_{j=1}^m n_{ij} \displaystyle n_{.j} = \sum_{i=1}^2 n_{ij}, and the ChiSquared test evaluates the decision variable:

    D^2 = \displaystyle \sum_{i=1}^2 \sum_{j=1}^m \frac{( n_{ij} - \frac{n_{i.} n_{.j}}{n} )^2}{\frac{n_{i.} n_{.j}}{n}}

    which tends towards the \chi^2(m-1) distribution. The hypothesis of independence is rejected if D^2 is too high (depending on the p-value threshold).

  • the Pearson test: it tests if there exists a linear relation between two scalar samples which form a gaussian vector (which is equivalent to have a linear correlation coefficient not equal to zero). If both samples are \underline{x} = (x_i)_{1 \leq i \leq n} and \underline{y} = (y_i)_{1 \leq i \leq n}, and \bar{x} = \displaystyle \frac{1}{n}\sum_{i=1}^n x_i and \bar{y} = \displaystyle \frac{1}{n}\sum_{i=1}^n y_i, the Pearson test evaluates the decision variable:

    D = \displaystyle \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2\sum_{i=1}^n (y_i - \bar{y})^2}}

    The variable D tends towards a \chi^2(n-2), under the hypothesis of normality of both samples. The hypothesis of a linear coefficient equal to 0 is rejected (which is equivalent to the independence of the samples) if D is too high (depending on the p-value threshold).

  • the Spearman test: it tests if there exists a monotonous relation between two scalar samples. If both samples are \underline{x} = (x_i)_{1 \leq i \leq n} and \underline{y}= (y_i)_{1 \leq i \leq n},, the Spearman test evaluates the decision variable:

    D = \displaystyle 1-\frac{6\sum_{i=1}^n (r_i - s_i)^2}{n(n^2-1)}

    where r_i = rank(x_i) and s_i = rank(y_i). D is such that \sqrt{n-1}D tends towards the gaussian (0,1) distribution.

[1]:
from __future__ import print_function
import openturns as ot

continuous samples

[2]:
# Create continuous samples
sample1 = ot.Normal().getSample(100)
sample2 = ot.Normal().getSample(100)
[3]:
# Using the Pearson test
ot.HypothesisTest.Pearson(sample1, sample2, 0.10)
[3]:

class=TestResult name=Unnamed type=Pearson binaryQualityMeasure=true p-value threshold=0.1 p-value=0.360164 statistic=0.919362 description=[]

[4]:
# Using the Spearman test
ot.HypothesisTest.Spearman(sample1, sample2, 0.10)
[4]:

class=TestResult name=Unnamed type=Spearman binaryQualityMeasure=true p-value threshold=0.1 p-value=0.193143 statistic=1.30926 description=[]

discrete samples

[5]:
# Create discrete samples
sample1 = ot.Poisson(0.2).getSample(100)
sample2 = ot.Poisson(0.2).getSample(100)
[6]:
# Using the Chi2 test
ot.HypothesisTest.ChiSquared(sample1, sample2, 0.10)
[6]:

class=TestResult name=Unnamed type=ChiSquared binaryQualityMeasure=true p-value threshold=0.1 p-value=0.790072 statistic=0.0708717 description=[]