Note
Go to the end to download the full example code.
Compare the distribution of quantile estimatorsΒΆ
In this example, we consider the quantile of level of a distribution
. The objective is to estimate some confidence intervals based on order statistics
that bound the quantile
with a given confidence
.
We define three quantile estimators which are all defined as particular order statistics:
the empirical estimator of
,
the estimator of an upper bound of
with confidence
,
the estimator of an lower bound of
with confidence
.
We draw the distribution of each estimator.
In this example, we consider the quantile of level ,
with a confidence level
.
See Exact quantile confidence interval based on order statistics and Asymptotic quantile confidence interval based on order statistics to get theoretical details.
import openturns as ot
import openturns.experimental as otexp
import openturns.viewer as otv
We consider a scalar random variable . Its distribution has no importance but is assumed to have
a density.
X_dist = ot.Normal(0, 1)
We define the level of the quantile and the confidence level
.
alpha = 0.95
beta = 0.90
The exact value of is known. We compute it in order to compare the
estimators to the exact value.
x_alpha_exact = X_dist.computeQuantile(alpha)[0]
print('Exact quantile = ', x_alpha_exact)
Exact quantile = 1.6448536269514726
We generate a sample of the variable.
n = 501
X_sample = X_dist.getSample(n)
Now, we assume to know only through the generated sample.
The empirical estimator of is defined by the
-th order statistics, denoted
by
,
where
. The numerotation
designs the smallest integer value that is greater
than or equal to
(we subtract 1 because python numbering starts at 0).
The value of this order statistics is computed on the sample.
k_emp = int(n * alpha)
empiricalQuantile = X_sample.computeQuantile(alpha)
print('Empirical quantile = ', empiricalQuantile)
print('Empirical order statistics = ', k_emp)
Empirical quantile = [1.68279]
Empirical order statistics = 475
Now, we want to get the exact unilateral confidence interval that provides an upper bound of with the confidence
, which means that
.
This can be done using
QuantileConfidence.
We can get the order of the useful statistics. Care that the enumeration begins at 0.
quantConf = otexp.QuantileConfidence(alpha, beta)
k_up = quantConf.computeUnilateralRank(n)
We can directly get the order statistics evaluated on the sample.
upper_bound = quantConf.computeUnilateralConfidenceInterval(X_sample)
print('Upper bound of the quantile = ', upper_bound)
print('Upper bound order statistics = ', k_up)
Upper bound of the quantile = ]-inf (-1.79769e+308), 1.73945]
Upper bound order statistics = 482
To get the exact unilateral confidence interval that provides a lower bound of with the confidence
, which means that
.
This can also be done using
QuantileConfidence.
We can get the order of the useful statistics. Care: the enumeration begins at 0 here!
k_low = quantConf.computeUnilateralRank(n, True)
We can directly get the order statistics evaluated on the sample.
lower_bound = quantConf.computeUnilateralConfidenceInterval(X_sample, True)
print('Lower bound of the quantile = ', lower_bound)
print('Lower bound order statistics = ', k_low)
Lower bound of the quantile = [1.49156, (1.79769e+308) +inf[
Lower bound order statistics = 469
In order to draw the distribution of any order statistics of , we first create the distribution
of the
order statistics of
, denoted by
.
This distribution as the following cumulative distribution function:
and its PDF (if defined) by:
where and
are respectively the cumulative distribution function and
probability density function of
and
is defined by:
Thus is defined as the joint distribution which marginals are all distributed
as
and which core is the order statistics distribution of the Uniform(0,1) distribution.
The order statistics distribution of the Uniform(0,1) is created using the class
UniformOrderStatistics which only requires to define the dimension of the
order statistics distribution.
unif_orderStat = ot.UniformOrderStatistics(n)
normal_orderStat = ot.JointDistribution([X_dist] * n, unif_orderStat)
Thus, the order statistics ,
and
are the marginals
,
and
of
.
X_emp = normal_orderStat.getMarginal(k_emp)
X_up = normal_orderStat.getMarginal(k_up)
X_low = normal_orderStat.getMarginal(k_low)
Now we can draw the pdf.
xMin = x_alpha_exact - 0.5
xMax = x_alpha_exact + 1
nPoints = 1001
g = X_emp.drawPDF(xMin, xMax, nPoints)
g.add(X_up.drawPDF(xMin, xMax, nPoints))
g.add(X_low.drawPDF(xMin, xMax, nPoints))
We add the exact quantile.
line_exactQuant = ot.Curve([[x_alpha_exact, 0.0], [x_alpha_exact, X_emp.computePDF([x_alpha_exact])]])
line_exactQuant.setLineStyle('dashed')
line_exactQuant.setLineWidth(2)
line_exactQuant.setColor('red')
g.add(line_exactQuant)
Before viewing the graph, we want to fill the area that corresponds to the events:
Each event has the probability .
To do that, we define a function that returns a sample created from a regular grid from xmin to xmax with npoints points.
def linearSample(xmin, xmax, npoints):
step = (xmax - xmin) / (npoints - 1)
rg = ot.RegularGrid(xmin, step, npoints)
vertices = rg.getVertices()
return vertices
Then we use it to define some Polygon that will be filled. These polygons
define the area under the pdf line. Thus, the lower vertical bound is 0 and the upper vertical bound is the PDF line.
First the area for the upper bound.
nplot = 100
x_list = linearSample(xMin, x_alpha_exact, nplot)
y_list = X_up.computePDF(x_list)
vLow = [0.0] * nplot
vUp = [y_list[i, 0] for i in range(nplot)]
boundsPoly_upper = ot.Polygon.FillBetween(x_list.asPoint(), vLow, vUp)
boundsPoly_upper.setColor(g.getColors()[1])
g.add(boundsPoly_upper)
Then the area for the lower bound.
nplot = 100
x_list = linearSample(x_alpha_exact, xMax, nplot)
y_list = X_low.computePDF(x_list)
vLow = [0.0] * nplot
vUp = [y_list[i, 0] for i in range(nplot)]
boundsPoly_lower = ot.Polygon.FillBetween(x_list.asPoint(), vLow, vUp)
boundsPoly_lower.setColor(g.getColors()[2])
g.add(boundsPoly_lower)
g.setLegends([r'$X_{emp}$', r'$X_{up} (\beta = $' + str(beta) + ')', r'$X_{low} (\beta = $' + str(beta) + ')', 'exact quantile'])
g.setLegendPosition('topright')
g.setTitle('Quantile estimation and confidence intervals')
g.setXTitle('x')
view = otv.View(g)
Display all the figures.
otv.View.ShowAll()
Now we can conclude that:
the empirical estimator is centered on the true quantile,
the
estimator has a probability
to be above the true quantile,
the
estimator has a probability
to be under the true quantile.
OpenTURNS