Uncertainty ranking: SRC and SRRC

Standard Regression Coefficients (SRC) deal with analyzing the influence the random vector \vect{X} = \left( X_1,\ldots,X_{n_X} \right) has on a random variable Y which is being studied for uncertainty. Here we attempt to measure linear relationships that exist between Y and the different components X_i.

The principle of the multiple linear regression model consists in attempting to find the function that links the variable Y to the n_x variables X_1,\ldots,X_{n_X} by means of a linear model:

Y = a_0 + \sum_{i=1}^{n_X} a_i X_i + \varepsilon,

where \varepsilon describes a random variable with zero mean and standard deviation \sigma_{\varepsilon} independent of the input variables X_i. If the random variables X_1,\ldots,X_{n_X} are independent and with finite variance \Var{X_k} = (\sigma_i)^2, the variance of Y can be estimated as follows:

\Var{Y} = \sum_{i=1}^n (a_i)^2 \Var{X_i} + (\sigma_{\varepsilon})^2 = (\sigma)^2.

From this we obtain the following coefficients:

C_i = a_i \sqrt{\frac{\Var{X_k}}{\Var{Y}}}

The estimators for the regression coefficients a_0,\ldots,a_{n_X}, and the standard deviation \sigma are obtained from a sample of (Y,X_1,\ldots,X_{n_X}). The SRC coefficients are defined as the estimators \widehat{C}_i of the coefficients C_i:

\widehat{C}_i = \frac{\displaystyle \widehat{a}_i \widehat{\sigma}_i}{\displaystyle \widehat{\sigma}},

where \widehat{a}_i denotes the estimate of the regression coefficient a_i, \widehat{\sigma}_i denotes the empirical standard deviation of the sample of the input variable X_i and \widehat{\sigma} denotes the empirical standard deviation of the sample of the output variable Y. The absolute value of this estimated contribution is by definition between 0 and 1. The closer it is to 1, the greater the impact the variable X_i has on the dispersion of Y.

The square (C_i)^2, which is the contribution of X_i to the variance of Y, is sometimes described in the literature as the “importance factor”, because of the similarity between this approach to linear regression and the method of cumulative variance which uses the term importance factor.

It is a good idea to check the quality of the linear regression before estimating the SRC coefficients: if the linear regression model is a poor fit to the data, then the SRC coefficients are useless.

Note that if there exists a map g such that Y=g(X_1, ..., X_{n_X}), then the squared SRC coefficients are equal to Sobol’ indices.

Standard Rank Regression Coefficients (SRRC) are SRC coefficients computed on the ranked input variables r\vect{X} = \left( rX_1,\ldots,rX_{n_X} \right) and the ranked output variable rY. They are useful when the relationship between Y and \vect{X} is not linear (so SRC cannot be used), but only monotonic.