Pearson’s correlation test

This method deals with the modelling of a probability distribution of a random vector \vect{X} = \left( X^1,\ldots,X^{n_X} \right). It seeks to find a type of dependency (here a linear correlation) which may exist between two components X^i and X^j.

The Pearson’s correlation coefficient \rho_{U,V}, defined in Pearson’s coefficient, measures the strength of a linear relationship between two random variables U and V. If we have a sample made up of N pairs \left\{ (u_1,v_1),(u_2,v_2),(u_N,v_N) \right\}, we denote \widehat{\rho}_{U,V} to be the estimated coefficient.

Even in the case where two variables U and V have a Pearson’s coefficient \rho_{U,V} equal to zero, the estimate \widehat{\rho}_{U,V} obtained from the sample may be non-zero: the limited sample size does not provide the perfect image of the real correlation. Pearson’s test nevertheless enables one to determine if the value obtained by \widehat{\rho}_{U,V} is significantly different from zero. More precisely, the user first chooses a probability \alpha. From this value the critical value d_\alpha is calculated such that:

  • if \left| \widehat{\rho}_{U,V} \right| > d_\alpha, one can conclude that the real Pearson’s correlation coefficient \rho_{U,V} is not zero; the risk of error in making this assertion is controlled and equal to \alpha;

  • if \left| \widehat{\rho}_{U,V} \right| \leq d_\alpha, there is insufficient evidence to reject the null hypothesis \rho_{U,V} = 0.

An important notion is the so-called “p-value” of the test. This quantity is equal to the limit error probability \alpha_\textrm{lim} under which the null correlation hypothesis is rejected. Thus, Pearson’s coefficient is supposed non zero if and only if \alpha_\textrm{lim} is greater than the value \alpha desired by the user. Note that the higher \alpha_\textrm{lim} - \alpha, the more robust the decision.

(Source code, png)

../../_images/pearson_test-1.png