Pearson’s correlation test¶

This method deals with the modelling of a probability distribution of a random vector $\vect{X} = \left( X^1,\ldots,X^{n_X} \right)$ . It seeks to find a type of dependency (here a linear correlation) which may exist between two components $X^i$ and $X^j$ .

The Pearson’s correlation coefficient $\rho_{U,V}$ , defined in Pearson’s coefficient, measures the strength of a linear relationship between two random variables $U$ and $V$ . If we have a sample made up of $N$ pairs $\left\{ (u_1,v_1),(u_2,v_2),(u_N,v_N) \right\}$ , we denote $\widehat{\rho}_{U,V}$ to be the estimated coefficient.

Even in the case where two variables $U$ and $V$ have a Pearson’s coefficient $\rho_{U,V}$ equal to zero, the estimate $\widehat{\rho}_{U,V}$ obtained from the sample may be non-zero: the limited sample size does not provide the perfect image of the real correlation. Pearson’s test nevertheless enables one to determine if the value obtained by $\widehat{\rho}_{U,V}$ is significantly different from zero. More precisely, the user first chooses a probability $\alpha$ . From this value the critical value $d_\alpha$ is calculated such that:

if $\left| \widehat{\rho}_{U,V} \right| > d_\alpha$ , one can conclude that the real Pearson’s correlation coefficient $\rho_{U,V}$ is not zero; the risk of error in making this assertion is controlled and equal to $\alpha$ ;
if $\left| \widehat{\rho}_{U,V} \right| \leq d_\alpha$ , there is insufficient evidence to reject the null hypothesis $\rho_{U,V} = 0$ .

An important notion is the so-called “ $p$ -value” of the test. This quantity is equal to the limit error probability $\alpha_\textrm{lim}$ under which the null correlation hypothesis is rejected. Thus, Pearson’s coefficient is supposed non zero if and only if $\alpha_\textrm{lim}$ is greater than the value $\alpha$ desired by the user. Note that the higher $\alpha_\textrm{lim} - \alpha$ , the more robust the decision.