Pearson correlation coefficient

The Pearson correlation coefficient \rho_P(X,Y) measures the strength of a linear relationship between two random variables X and Y with finite variance. It is defined as follows:

\rho_P(X,Y)= \dfrac{\Cov{X,Y}}{\sqrt{\Var{X}\Var{Y}}}

where \Cov{X,Y} = \Expect{ \left( X - \Expect{X} \right) \left( Y - \Expect{Y} \right) }.

Let ((x_1, y_1), \dots, (x_\sampleSize, y_\sampleSize)) be a sample generated by the bivariate random vector (X,Y). The Pearson correlation coefficient is estimated:

(1)\hat{\rho}_P(X,Y) = \dfrac{\sum_{k=1}^\sampleSize (x_k- \bar{x})(y_k- \bar{y})}
{\sqrt{\sum_{k=1}^\sampleSize(x_k- \bar{x})^2\sum_{k=1}^\sampleSize(y_k- \bar{y})^2}}

where \bar{x} = \dfrac{1}{\sampleSize} \sum_{k=1}^\sampleSize x_k and \bar{y} = \dfrac{1}{\sampleSize} \sum_{k=1}^\sampleSize y_k are the empirical mean of each sample.

The estimate \hat{\rho}_P(X,Y) of the Pearson correlation coefficient is sometimes denoted by r.

We sum up some interesting features of the coefficient:

  • The Pearson’s correlation coefficient takes values between -1 and 1.

  • If |\rho_P(X,Y)|=1 then there exists a linear relationship between X and Y.

  • The closer |\rho_P(X,Y)| is to 1, the stronger the indication is that a linear relationship exists between X and Y. The sign of the Pearson’s coefficient indicates if the two variables increase or decrease in the same direction (positive coefficient) or in opposite directions (negative coefficient).

  • If X and Y are independent, then \rho_P(X,Y)=0.

  • If \rho_P(X,Y)=0, it does not imply the independence of the variables X and Y. It may only means that the relation between both variables is not linear.

(Source code, svg)

../../_images/pearson_coefficient-1.svg

(Source code, svg)

../../_images/pearson_coefficient-2.svg

(Source code, svg)

../../_images/pearson_coefficient-3.svg

(Source code, svg)

../../_images/pearson_coefficient-4.svg