Uncertainty ranking: PCC and PRCC

Partial Correlation Coefficients deal with analyzing the influence of the random vector \inputRV = \left( X_1,\ldots,X_{\inputDim} \right) on a random variable Y, which is being studied. Here we attempt to measure the linear relationships that exist between Y and the different components X_i.

The basic method of hierarchical ordering using Pearson’s coefficients deals with the case where the variable Y depends linearly on the \inputDim variables \left\{ X_1,\ldots,X_{\inputDim} \right\}.

Partial Correlation Coefficients are also useful in this case but provide a different kind of information: the partial correlation coefficient \textrm{PCC}_{X_i,Y} between the variables Y and X_i measures the residual influence of X_i on Y once influences from all other variables X_j have been eliminated. In particular, if X_1 and X_2 are perfectly correlated, then \textrm{PCC}_{X_1,Y} = \textrm{PCC}_{X_2,Y} = 0.

For any variable index i \in \{1, ..., \inputDim\}, the estimation for each partial correlation coefficient \textrm{PCC}_{X_i,Y} uses a sample of size \sampleSize denoted by \left\{ \left(y^{(1)},x_1^{(1)},\ldots,x_{\inputDim}^{(1)} \right),\ldots, \left(y^{(\sampleSize)},x_1^{(\sampleSize)},\ldots,x_{\inputDim}^{(\sampleSize)} \right) \right\} of the vector (Y,X_1,\ldots,X_{\inputDim}). This requires the following three steps to be carried out:

  1. Determine the effect of other variables \left\{ X_j,\ j\neq i \right\} on Y by linear regression; when the values of the variables \left\{ X_j,\ j\neq i \right\} are known, the average forecast for the value of Y is then available in the form of the equation:

    \begin{aligned}
      \widehat{Y} = \sum_{k \neq i,\ 1 \leq k \leq d} \widehat{a}_k X_k
    \end{aligned}

  2. Determine the effect of other variables \left\{ X_j,\ j\neq i \right\} on X_i by linear regression; when the values of the variables \left\{ X_j,\ j\neq i \right\} are known, the average forecast for the value of X_i is then available in the form of the equation:

    \begin{aligned}
      \widehat{X}_i = \sum_{k \neq i,\ 1 \leq k \leq d} \widehat{b}_k X_k
    \end{aligned}

  3. The Pearson Correlation Coefficient coefficient \textrm{PCC}_{X_i,Y} is then equal to the sample Pearson correlation coefficient \widehat{\rho}_{Y-\widehat{Y},X_i-\widehat{X}_i} estimated for the variables Y-\widehat{Y} and X_i-\widehat{X}_i.

One can then class the d variables X_1,\ldots, X_{\inputDim} according to the absolute value of the partial correlation coefficients: the higher the value of \left| \textrm{PCC}_{X_i,Y} \right|, the greater the impact the variable X_i has on Y.

In order to introduce the PRCC, we need to define the rank of an observation of a random variable. Let X be a random variable. Let \{x_1, ..., x_{\sampleSize}\} be a sample of size \sampleSize. We can sort the sample in increasing order:

For any i \in \sampleSize, the index j:=\text{rank}(x_i) \in \{1, ..., \sampleSize\} is the rank of the i-th observation if x_i is the j-th largest observation in the sample. In other words, the observation x_i appears at the j-th index in the ordered sample \{x_{(1)}, ..., x_{(\sampleSize)}\}.

Now that the rank of a random variable is defined, consider again the case of an input random vector \inputRV = \left( X_1,\ldots,X_{\inputDim} \right) and the output random variable Y. The Partial Rank Correlation Coefficients (PRCC) are PRC coefficients computed on the rank of the input variables \text{rank}(\inputRV) = \left( \text{rank}(X_1),\ldots, \text{rank}(X_{\inputDim}) \right) and the rank of the output variable \text{rank}(Y).