Akaike Information Criterion (AIC)¶
This method deals with the modelling of a probability distribution of a
random variable . It seeks to rank variable candidate
distributions by using a sample of data
.
We denote by ,…,
the parametric models
envisaged by user among the parametric models.
We suppose here that the parameters of these models have been estimated
previously by Maximum Likelihood
the on the basis of the sample
. We
denote by
the maximized likelihood for the model
.
By definition of the likelihood, the higher , the better the
model describes the sample. However, using the likelihood as a criterion
to rank the candidate probability distributions would involve a risk:
one would almost always favor complex models involving many parameters.
If such models provide indeed a large numbers of degrees-of-freedom that
can be used to fit the sample, one has to keep in mind that complex
models may be less robust that simpler models with less parameters.
Actually, the limited available information (
data points) does
not allow to estimate robustly too many parameters.
The Akaike Information Criterion (AIC) can be used to avoid this problem.
The principle is to rank according to the following quantity:
where denotes the number of parameters being adjusted for
the model
. The smaller
, the better
the model. Note that the idea is to introduce a penalization term that
increases with the numbers of parameters to be estimated. A complex
model will then have a good score only if the gain in terms of
likelihood is high enough to justify the number of parameters used.
In context of small data, there is a substantial risk that AIC select models that have too many parameters. In other words, the risk of overfitting is important. To tackle such issue, the AICc criterion was developed : it consists in evaluating the AIC with a correction term ( extra penalty) for small data. The formula is as follows :
One might notice that the extra term penalty vanishes for
.
API:
Examples: