Compare frequentist and Bayesian estimationΒΆ

In this example, we want to estimate of the parameter \vect{\Theta} of the distribution of a random vector \inputRV from which we have some data. We compare the frequentist and the Bayesian approaches to estimate \vect{\theta}.

Let \inputRV = (X_0, X_1) be a random vector following a bivariate normal distribution with zero mean, unit variance and independent components: \cN_2 \left(\vect{\theta}\right) where the parameter \vect{\theta} = (\mu_0, \sigma_0, \mu_1, \sigma_1, \rho) = (0, 1, 0, 1, 0). We assume to have a sample generated from \vect{X} denoted by (\inputReal_1, \dots, \inputReal_\sampleSize) where \sampleSize = 25.

We assume to know the parameters (\mu_0, \mu_1, \rho) and we want to estimate the parameters (\sigma_0, \sigma_1). In the Bayesian approach, we assume that \vect{\Theta} = (0, \Sigma_0, 0, \Sigma_1, 0) is a random vector and we define a link function g : \Rset^2 \rightarrow \Rset^5 such that:

\vect{\Theta} = g(\vect{Y})

where \vect{Y} follows the prior distribution denoted by \pi_{\vect{Y}}^0.

Note that the link function g already contains the information that the two components of \inputRV are independent (as \rho = 0) and centered (as \mu_i = 0).

We consider two interesting link functions:

g_1(\vect{y}) & = (0, y_0, 0, y_1, 0) \\
g_2(\vect{y}) & = (0,0.5+y_0^2, 0, 0.5+y_1^2, 0)

each one being associated to the respective prior distributions:

\pi_{\vect{Y}}^{0,1} & = \cT(0,1,2) \times \cT(0,1,2) \\
\pi_{\vect{Y}}^{0,2} & = \cT(-1, 0, 1) \times \cT(-1, 0,1)

The second case is such that the link function g_2 is not bijective on the range of \pi_{\vect{Y}}^{0,2}.

The Bayesian approach uses the PosteriorDistribution that estimates the posterior distribution of \vect{Y} denoted by \pi_{\vect{Y}}^\sampleSize maximizing the likelihood of the conditioned model on the sample, weighted by the prior distribution \pi_{\vect{Y}}^0. From the \pi_{\vect{Y}}^\sampleSize distribution, we extract the vector of modes denoted by \vect{Y}_n^m: this point maximizes \pi_{\vect{Y}}^\sampleSize.

It is interesting to note that when n \rightarrow +\infty, then \pi_{\vect{Y}}^\sampleSize \rightarrow \pi_{\vect{Y}}^* such that g(\pi_{\vect{Y}}^*) is a Dirac distribution centered at \vect{\theta}.

The frequentist approach estimates the parameter \vect{\theta} by maximizing the likelihood of the normal model on the sample. To model the same level of information, we consider a centered bivariate normal model with independent components. We denote by \vect{\theta}_n^{MC} the parameter obtained.

import openturns as ot
import openturns.viewer as otv
import openturns.experimental as otexp

ot.ResourceMap.SetAsUnsignedInteger(
    "DeconditionedDistribution-MarginalIntegrationNodesNumber", 32
)
ot.ResourceMap.SetAsString(
    "DeconditionedDistribution-ContinuousDiscretizationMethod", "GaussProduct"
)

Here, we define a function that computes the mode of a distribution, which is the point that maximises its PDF.

def computeMode(distribution):
    def obj_py(X):
        return distribution.computeLogPDF(X) * (-1.0)

    obj = ot.PythonFunction(distribution.getDimension(), 1, func_sample=obj_py)
    pb = ot.OptimizationProblem(obj)
    pb.setBounds(distribution.getRange())
    algo = ot.Cobyla(pb)
    algo.setStartingPoint(distribution.getMean())
    algo.run()
    return algo.getResult().getOptimalPoint()

Define the theoretical \vect{X} distribution which is a normal distribution with zero mean, unit variance and independent components.

conditioned = ot.Normal(2)
conditioned.setDescription(["X0", "X1"])

We create the sample that we are going to use to estimate the parameters.

Nsample = 25
observations = conditioned.getSample(Nsample)