Bayesian Information Criterion (BIC)¶
This method can be used to rank candidate distributions with respect to data
.
We denote by the
parametric models we want to test
(see parametric models).
We suppose here that the parameters of these models have been estimated from the sample (see
Maximum Likelihood). We
denote by
the maximized likelihood of the sample with respect to
the model
.
By definition of the likelihood, the higher , the better the
model describes the sample. However, relying entirely on the value of the likelihood
runs the risk of systematically selecting the model with the most parameters. As a matter of fact,
the greater the number of parameters, the easier it is for the distribution to adapt to the data.
The Bayesian Information Criterion (BIC) can be used to avoid this problem. It is also referred to in the literature as the Schwarz information criterion. It is an information criterion derived from the Akaike information criterion (see Akaike Information Criterion (AIC)). Note that the library divides the BIC defined in the literature by the sample size, which has no impact on the selection of the best model.
The BIC of the model is defined in the library by:
where denotes the number of parameters of the model
that have been inferred from the sample. The smaller
, the better
the model:
The idea is to introduce a penalization term that increases with the numbers of parameters to be estimated. A complex model will then have a good score only if the gain in terms of likelihood is high enough to justify the number of parameters used.
The term “Bayesian Information Criterion” comes from the interpretation of
the quantity . In a Bayesian context, the unknown
“true” model may be seen as a random variable. Suppose now that the user
does not have any informative prior information on which model is more
relevant among
; all the models are thus
equally likely from the point of view of the user. Then, one can show
that
is an approximation of the posterior
distribution’s logarithm for the model
.
This criterion is a valuable criterion to reject a model which is not relevant, but can be tricky to interpret in some cases. For example, if two models have very close BIC, these two models should be considered instead of keeping only the model which has the lowest BIC.