Pearson correlation coefficient¶
Pearson’s correlation coefficient aims to measure
the strength of a linear relationship between two random variables
and
. It is defined as follows:
where
,
,
,
and
.
If we have a sample made up of a set of
pairs
, Pearson’s
correlation coefficient can be estimated using the formula:
where and
represent the
empirical means of the samples
and
.
Pearson’s correlation coefficient takes values between -1 and 1. The
closer its absolute value is to 1, the stronger the indication is that a
linear relationship exists between variables and
.
The sign of Pearson’s coefficient indicates if the two variables
increase or decrease in the same direction (positive coefficient) or in
opposite directions (negative coefficient). We note that a correlation
coefficient equal to 0 does not necessarily imply the independence of
variables
and
: this property is in fact
theoretically guaranteed only if
and
both follow a
Normal distribution. In all other cases, there are two possible
situations in the event of a zero Pearson’s correlation coefficient:
the variables
and
are in fact independent,
or a non-linear relationship exists between
and
.
(Source code
, png
)
![../../_images/pearson_coefficient-1.png](../../_images/pearson_coefficient-1.png)
(Source code
, png
)
![../../_images/pearson_coefficient-2.png](../../_images/pearson_coefficient-2.png)
(Source code
, png
)
![../../_images/pearson_coefficient-3.png](../../_images/pearson_coefficient-3.png)
(Source code
, png
)
![../../_images/pearson_coefficient-4.png](../../_images/pearson_coefficient-4.png)
The estimate of Pearson’s correlation
coefficient is sometimes denoted by
.