Spearman correlation coefficient

The Spearman rank correlation coefficient measures how strongly two random variables with finite variance are correlated. Spearman’s correlation assesses monotonic relationships between both variables.

Let (X,Y) be two random variables which CDF are denoted by F_X and F_Y. Spearman’s rank correlation coefficient \rho_S(X,Y) is defined by:

\rho_S(X,Y) = \dfrac{\Cov{F_X(X),F_Y(Y)}}{\sqrt{\Var{F_X(X)}\Var{F_Y(Y)}}}

where \Cov{.} is the covariance operator and F_X and F_Y are the respective CDF of X and Y.

The Spearman correlation between two variables is equal to the Pearson correlation coefficient between the rank values of the variables:

\rho_S(X,Y) = \rho_P(F_X(X), F_Y(Y))

If C is the CDF of the copula of the random vector (X,Y), then we get:

\rho_S(X,Y) = \rho_P(F_X(X),F_Y(Y)) = 12 \iint_{[0,1]^2} C(u,v)\,du\,dv - 3

which shows that the Spearman correlation is linked to the copula only.

Let ((x_1, y_1), \dots, (x_\sampleSize, y_\sampleSize)) be a sample generated by the bivariate random vector (X,Y). We denote by (r_1, s_1), \dots, (r_\sampleSize, s_\sampleSize) the rank sample, which means that r_k is the rank of the value x_k within the sample (x_1, \dots, x_\sampleSize) and s_k is the rank of the value y_k within the sample (y_1, \dots, y_\sampleSize). The estimator \hat{\rho}_S(X,Y) is equal to the estimator \hat{\rho}_P(X,Y) computed on the rank sample (r_1, s_1), \dots, (r_\sampleSize, s_\sampleSize). It is estimated as follows:

(1)\hat{\rho}_S(X,Y) = \dfrac{\sum_{k=1}^\sampleSize (r_k- \bar{r})(s_k- \bar{s})}
{\sqrt{\sum_{k=1}^\sampleSize(r_k- \bar{r})^2\sum_{k=1}^\sampleSize(s_k- \bar{s})^2}}

where \bar{r} = \dfrac{1}{\sampleSize} \sum_{k=1}^\sampleSize r_k and \bar{s} = \dfrac{1}{\sampleSize} \sum_{k=1}^\sampleSize s_k are the empirical mean rank of each sample.

We sum up some interesting features of the coefficient:

  • The Spearman correlation coefficient takes values between -1 and 1.

  • If |\rho_S(X,Y)|=1 then there exists a monotonic function \varphi such that Y=\varphi(X).

  • The closer |\rho_S(X,Y)| is to 1, the stronger the indication is that a monotonic relationship exists between X and Y. The sign of the Spearman coefficient indicates if the two variables increase or decrease in the same direction (positive coefficient) or in opposite directions (negative coefficient).

  • If X and Y are independent, then \rho_S(X,Y)=0.

  • If \rho_S(X,Y)=0, it does not imply the independence of the variables X and Y. It may only means that the relation between both variables is not monotonic.

(Source code, svg)

../../_images/spearman_coefficient-1.svg

(Source code, svg)

../../_images/spearman_coefficient-2.svg

(Source code, svg)

../../_images/spearman_coefficient-3.svg

(Source code, svg)

../../_images/spearman_coefficient-4.svg

Spearman’s coefficient is often referred to as the rank correlation coefficient.