Uncertainty ranking: SRC and SRRC

Standard Regression Coefficients (SRC) deal with analyzing the influence the random vector \inputRV = \left( X_1,\ldots,X_\inputDim \right) has on a random variable Y. We measure linear relationships that exist between Y and the different input variables X_i.

The principle of the multiple linear regression model consists in attempting to find the function that links the variable Y to the \inputDim variables X_1,\ldots,X_\inputDim by means of a linear model:

Y = a_0 + \sum_{i=1}^\inputDim a_i X_i + \varepsilon,

where (a_i)_{i = 0, 1, ..., \inputDim} are constant parameters and \varepsilon is a random variable with zero mean and standard deviation \sigma_{\varepsilon} independent of the input variables X_i. If the random variables X_1,\ldots,X_\inputDim are independent and with finite variance \Var{X_i}, the variance of Y can be estimated as follows:

\Var{Y} = \sum_{i=1}^\inputDim a_i^2 \Var{X_i} + \sigma_{\varepsilon}^2 = \sigma^2.

The SRC coefficient is defined by (see [borgonovo2017] page 131, eq. 14.3):

\operatorname{SRC}_i = a_i \sqrt{\frac{\Var{X_i}}{\Var{Y}}}

for i = 1, ..., d. The estimators for the regression coefficients a_0,\ldots,a_\inputDim, and the standard deviation \sigma are obtained from a sample of (Y,X_1,\ldots,X_\inputDim). The SRC coefficients are defined as the estimators \widehat{\operatorname{SRC}}_i of the coefficients SRC_i:

\widehat{\operatorname{SRC}}_i = \widehat{a}_i \frac{\widehat{\sigma}_i}{\widehat{\sigma}},

for i = 1, ..., \inputDim, where \widehat{a}_i is the estimate of the regression coefficient a_i, \widehat{\sigma}_i is the sample standard deviation of the sample of the input variable X_i and \widehat{\sigma} is the sample standard deviation of the sample of the output variable Y. The absolute value of this estimated contribution is by definition between 0 and 1. The closer it is to 1, the greater the impact the variable X_i has on the variance of Y. See the computeSRC method to compute the SRC coefficients.

Before estimating the SRC coefficients, we mush check the quality of the linear regression: if the linear regression model is a poor fit to the data, then the SRC coefficients are useless. See e.g. the MetaModelValidation class to validate the linear model against a test data set.

The \operatorname{SRC}_i^2 index, which is the contribution of X_i to the variance of Y, is sometimes described in the literature as the “importance factor”, because of the similarity between this approach to linear regression and the method of cumulative variance which uses the term importance factor. This importance factor is also named “squared SRC” coefficient (see [borgonovo2017] page 131, eq. 14.5):

\operatorname{SRC}_i^2 = a_i^2 \frac{\Var{X_i}}{\Var{Y}}

for i = 1, ..., \inputDim. The squared SRC coefficients of a linear model are equal to its Sobol’ indices. If the model is linear, the first-order Sobol’ index is equal to the total Sobol’ index since there is no interaction. See the computeSquaredSRC method to compute the squared SRC coefficients.

If the input random variables (X_i)_{i = 1, ..., \inputDim} are dependent, then the SRC is not a valid importance measure anymore (see [daveiga2022] remark 4 page 33). In this case, the partial correlation coefficient (PCC) has been suggested, but this index is rather a measure of the linear relationship between the input and the output. Other indices such as the Lindeman-Merenda-Gold (LMG) have been suggested in the dependent case (see [daveiga2022] page 33).

Standard Rank Regression Coefficients (SRRC) are SRC coefficients computed on the ranked input variables \operatorname{rank}(\inputRV) = \left( \operatorname{rank}(X_1), \ldots, \operatorname{rank}(X_\inputDim) \right) and the ranked output variable \operatorname{rank}(Y). They are useful when the relationship between Y and \inputRV is not linear (so SRC cannot be used), but only monotonic. See the computeSRRC method to compute the SRRC coefficients.