Compare the distribution of quantile estimatorsΒΆ

In this example, we consider the quantile of level \alpha of a distribution X. The objective is to estimate some confidence intervals based on order statistics that bound the quantile x_{\alpha} with a given confidence \beta \in [0,1].

We define three quantile estimators which are all defined as particular order statistics:

  • the empirical estimator of x_{\alpha},

  • the estimator of an upper bound of x_{\alpha} with confidence \beta,

  • the estimator of an lower bound of x_{\alpha} with confidence \beta.

We draw the distribution of each estimator.

In this example, we consider the quantile of level \alpha = 95\%, with a confidence level \beta = 90\%.

See Exact quantile confidence interval based on order statistics and Asymptotic quantile confidence interval based on order statistics to get theoretical details.

import openturns as ot
import openturns.experimental as otexp
import openturns.viewer as otv

We consider a scalar random variable X. Its distribution has no importance but is assumed to have a density.

X_dist = ot.Normal(0, 1)

We define the level \alpha of the quantile and the confidence level \beta.

alpha = 0.95
beta = 0.90

The exact value of x_{\alpha} is known. We compute it in order to compare the estimators to the exact value.

x_alpha_exact = X_dist.computeQuantile(alpha)[0]
print('Exact quantile = ', x_alpha_exact)
Exact quantile =  1.6448536269514726

We generate a sample of the variable.

n = 501
X_sample = X_dist.getSample(n)

Now, we assume to know X only through the generated sample.

The empirical estimator of x_{\alpha} is defined by the k_{emp} -th order statistics, denoted by X_{(k_{emp})}, where k_{emp} = (\lceil \sampleSize \alpha \rceil) -1. The numerotation \lceil x \rceil designs the smallest integer value that is greater than or equal to x (we subtract 1 because python numbering starts at 0). The value of this order statistics is computed on the sample.

k_emp = int(n * alpha)
empiricalQuantile = X_sample.computeQuantile(alpha)
print('Empirical quantile = ', empiricalQuantile)
print('Empirical order statistics = ', k_emp)
Empirical quantile =  [1.68279]
Empirical order statistics =  475

Now, we want to get the exact unilateral confidence interval that provides an upper bound of x_{\alpha} with the confidence \beta, which means that \Prob{X_{(k_{up})} \geq x_{\alpha}} \geq \beta. This can be done using QuantileConfidence. We can get the order k_{up} of the useful statistics. Care that the enumeration begins at 0.

quantConf = otexp.QuantileConfidence(alpha, beta)
k_up = quantConf.computeUnilateralRank(n)

We can directly get the order statistics X_{(k_{up})} evaluated on the sample.

upper_bound = quantConf.computeUnilateralConfidenceInterval(X_sample)
print('Upper bound of the quantile = ', upper_bound)
print('Upper bound order statistics = ', k_up)
Upper bound of the quantile =  ]-inf (-1.79769e+308), 1.73945]
Upper bound order statistics =  482

To get the exact unilateral confidence interval that provides a lower bound of x_{\alpha} with the confidence \beta, which means that \Prob{X_{(k_{low})} \leq x_{\alpha}} \geq \beta. This can also be done using QuantileConfidence. We can get the order k_{low} of the useful statistics. Care: the enumeration begins at 0 here!

k_low = quantConf.computeUnilateralRank(n, True)

We can directly get the order statistics X_{(k_{low})} evaluated on the sample.

lower_bound = quantConf.computeUnilateralConfidenceInterval(X_sample, True)
print('Lower bound of the quantile = ', lower_bound)
print('Lower bound order statistics = ', k_low)
Lower bound of the quantile =  [1.49156, (1.79769e+308) +inf[
Lower bound order statistics =  469

In order to draw the distribution of any order statistics of X, we first create the distribution of the \sampleSize order statistics of X, denoted by \vect{X}_{(0:\sampleSize-1)} = (X_{(0)},\dots,X_{(\sampleSize-1)}). This distribution as the following cumulative distribution function:

F_{\vect{X}_{(0:\sampleSize-1)}}(\vect{x}) = F_{\vect{U}_{(0:\sampleSize-1)}}(F(x_1), \dots, F(x_{\sampleSize}-1))

and its PDF (if defined) by:

f_{\vect{X}_{(0:\sampleSize-1)}}(\vect{x}) = \sampleSize!\prod_{i=0}^{\sampleSize-1} f(x_i) \,\mathbf{1}_{\cS}(\vect{x})

where F and f are respectively the cumulative distribution function and probability density function of X and \cS \subset \Rset^\sampleSize is defined by:

\cS=\left\{(x_0, \dots, x_{\sampleSize-1}) \in [0,1]^\sampleSize\,|\,0 \leq x_0 \leq \dots \leq x_{\sampleSize-1} \leq 1 \right\}.

Thus \vect{X}_{(0:\sampleSize-1)} is defined as the joint distribution which marginals are all distributed as X and which core is the order statistics distribution of the Uniform(0,1) distribution. The order statistics distribution of the Uniform(0,1) is created using the class UniformOrderStatistics which only requires to define the dimension of the order statistics distribution.

unif_orderStat = ot.UniformOrderStatistics(n)
normal_orderStat = ot.JointDistribution([X_dist] * n, unif_orderStat)

Thus, the order statistics X_{(k_{emp})}, X_{(k_{low})} and X_{(k_{up})} are the marginals k_{emp}, k_{low} and k_{up} of \vect{X}_{(0:\sampleSize-1)}.

X_emp = normal_orderStat.getMarginal(k_emp)
X_up = normal_orderStat.getMarginal(k_up)
X_low = normal_orderStat.getMarginal(k_low)

Now we can draw the pdf.

xMin = x_alpha_exact - 0.5
xMax = x_alpha_exact + 1
nPoints = 1001
g = X_emp.drawPDF(xMin, xMax, nPoints)
g.add(X_up.drawPDF(xMin, xMax, nPoints))
g.add(X_low.drawPDF(xMin, xMax, nPoints))

We add the exact quantile.

line_exactQuant = ot.Curve([[x_alpha_exact, 0.0], [x_alpha_exact, X_emp.computePDF([x_alpha_exact])]])
line_exactQuant.setLineStyle('dashed')
line_exactQuant.setLineWidth(2)
line_exactQuant.setColor('red')
g.add(line_exactQuant)

Before viewing the graph, we want to fill the area that corresponds to the events:

\left \{ X_{(k_{up})} \leq x_{\alpha} \right \} \\
\left \{ X_{(k_{low})} \geq x_{\alpha} \right \}

Each event has the probability 1- \beta.

To do that, we define a function that returns a sample created from a regular grid from xmin to xmax with npoints points.

def linearSample(xmin, xmax, npoints):
    step = (xmax - xmin) / (npoints - 1)
    rg = ot.RegularGrid(xmin, step, npoints)
    vertices = rg.getVertices()
    return vertices

Then we use it to define some Polygon that will be filled. These polygons define the area under the pdf line. Thus, the lower vertical bound is 0 and the upper vertical bound is the PDF line.

First the area for the upper bound.

nplot = 100
x_list = linearSample(xMin, x_alpha_exact, nplot)
y_list = X_up.computePDF(x_list)
vLow = [0.0] * nplot
vUp = [y_list[i, 0] for i in range(nplot)]
boundsPoly_upper = ot.Polygon.FillBetween(x_list.asPoint(), vLow, vUp)
boundsPoly_upper.setColor(g.getColors()[1])
g.add(boundsPoly_upper)

Then the area for the lower bound.

nplot = 100
x_list = linearSample(x_alpha_exact, xMax, nplot)
y_list = X_low.computePDF(x_list)
vLow = [0.0] * nplot
vUp = [y_list[i, 0] for i in range(nplot)]
boundsPoly_lower = ot.Polygon.FillBetween(x_list.asPoint(), vLow, vUp)
boundsPoly_lower.setColor(g.getColors()[2])
g.add(boundsPoly_lower)
g.setLegends([r'$X_{emp}$', r'$X_{up} (\beta = $' + str(beta) + ')', r'$X_{low} (\beta = $' + str(beta) + ')', 'exact quantile'])
g.setLegendPosition('topright')
g.setTitle('Quantile estimation and confidence intervals')
g.setXTitle('x')
view = otv.View(g)
Quantile estimation and confidence intervals

Display all the figures.

otv.View.ShowAll()

Now we can conclude that:

  • the empirical estimator is centered on the true quantile,

  • the X_{(k_{up})} estimator has a probability \beta to be above the true quantile,

  • the X_{(k_{low})} estimator has a probability \beta to be under the true quantile.