{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sample independence test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example we are going to perform tests to assess whether two 1-d samples are independent or not.\n", "\n", "The following tests are available:\n", "\n", "- the ChiSquared test: it tests if both scalar samples (discrete ones only) are independent.\n", " If $n_{ij}$ is the number of values of the sample $i=(1,2)$ in the modality $1 \\leq j \\leq m$, $\\displaystyle n_{i.} = \\sum_{j=1}^m n_{ij}$ $\\displaystyle n_{.j} = \\sum_{i=1}^2 n_{ij}$, and the ChiSquared test evaluates the decision variable:\n", " $$D^2 = \\displaystyle \\sum_{i=1}^2 \\sum_{j=1}^m \\frac{( n_{ij} - \\frac{n_{i.} n_{.j}}{n} )^2}{\\frac{n_{i.} n_{.j}}{n}}$$\n", "\n", " which tends towards the $\\chi^2(m-1)$ distribution. The hypothesis of independence is rejected if $D^2$ is too high (depending on the p-value threshold).\n", "\n", "- the Pearson test: it tests if there exists a linear relation between two scalar samples which form a gaussian vector (which is equivalent to have a linear correlation coefficient not equal to zero).\n", " If both samples are $\\underline{x} = (x_i)_{1 \\leq i \\leq n}$ and $\\underline{y} = (y_i)_{1 \\leq i \\leq n}$, and $\\bar{x} = \\displaystyle \\frac{1}{n}\\sum_{i=1}^n x_i$ and $\\bar{y} = \\displaystyle \\frac{1}{n}\\sum_{i=1}^n y_i$, the Pearson test evaluates the decision variable:\n", " $$D = \\displaystyle \\frac{\\sum_{i=1}^n (x_i - \\bar{x})(y_i - \\bar{y})}{\\sqrt{\\sum_{i=1}^n (x_i - \\bar{x})^2\\sum_{i=1}^n (y_i - \\bar{y})^2}}$$\n", "\n", " The variable $D$ tends towards a $\\chi^2(n-2)$, under the hypothesis of normality of both samples. The hypothesis of a linear coefficient equal to 0 is rejected (which is equivalent to the independence of the samples) if D is too high (depending on the p-value threshold).\n", "\n", "- the Spearman test: it tests if there exists a monotonous relation between two scalar samples.\n", " If both samples are $\\underline{x} = (x_i)_{1 \\leq i \\leq n}$ and $\\underline{y}= (y_i)_{1 \\leq i \\leq n}$,, the Spearman test evaluates the decision variable:\n", " $$D = \\displaystyle 1-\\frac{6\\sum_{i=1}^n (r_i - s_i)^2}{n(n^2-1)}$$\n", "\n", " where $r_i = rank(x_i)$ and $s_i = rank(y_i)$. $D$ is such that $\\sqrt{n-1}D$ tends towards the gaussian (0,1) distribution.\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from __future__ import print_function\n", "import openturns as ot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**continuous samples**" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Create continuous samples\n", "sample1 = ot.Normal().getSample(100)\n", "sample2 = ot.Normal().getSample(100)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
class=TestResult name=Unnamed type=Pearson binaryQualityMeasure=true p-value threshold=0.1 p-value=0.360164 statistic=0.919362 description=[]
" ], "text/plain": [ "class=TestResult name=Unnamed type=Pearson binaryQualityMeasure=true p-value threshold=0.1 p-value=0.360164 statistic=0.919362 description=[]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Using the Pearson test\n", "ot.HypothesisTest.Pearson(sample1, sample2, 0.10)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "class=TestResult name=Unnamed type=Spearman binaryQualityMeasure=true p-value threshold=0.1 p-value=0.193143 statistic=1.30926 description=[]
" ], "text/plain": [ "class=TestResult name=Unnamed type=Spearman binaryQualityMeasure=true p-value threshold=0.1 p-value=0.193143 statistic=1.30926 description=[]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Using the Spearman test\n", "ot.HypothesisTest.Spearman(sample1, sample2, 0.10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**discrete samples**" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Create discrete samples\n", "sample1 = ot.Poisson(0.2).getSample(100)\n", "sample2 = ot.Poisson(0.2).getSample(100)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "class=TestResult name=Unnamed type=ChiSquared binaryQualityMeasure=true p-value threshold=0.1 p-value=0.790072 statistic=0.0708717 description=[]
" ], "text/plain": [ "class=TestResult name=Unnamed type=ChiSquared binaryQualityMeasure=true p-value threshold=0.1 p-value=0.790072 statistic=0.0708717 description=[]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Using the Chi2 test\n", "ot.HypothesisTest.ChiSquared(sample1, sample2, 0.10)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" } }, "nbformat": 4, "nbformat_minor": 1 }