.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_data_analysis/sample_analysis/plot_quantile_confidence_exact_empirical.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_data_analysis_sample_analysis_plot_quantile_confidence_exact_empirical.py: Compare the distribution of quantile estimators =============================================== .. GENERATED FROM PYTHON SOURCE LINES 7-24 In this example, we consider the quantile of level :math:`\alpha` of a distribution :math:`X`. The objective is to estimate some confidence intervals based on order statistics that bound the quantile :math:`x_{\alpha}` with a given confidence :math:`\beta \in [0,1]`. We define three quantile estimators which are all defined as particular order statistics: - the empirical estimator of :math:`x_{\alpha}`, - the estimator of an upper bound of :math:`x_{\alpha}` with confidence :math:`\beta`, - the estimator of an lower bound of :math:`x_{\alpha}` with confidence :math:`\beta`. We draw the distribution of each estimator. In this example, we consider the quantile of level :math:`\alpha = 95\%`, with a confidence level :math:`\beta = 90\%`. See :ref:`quantile_confidence_estimation` and :ref:`quantile_asymptotic_confidence_estimation` to get theoretical details. .. GENERATED FROM PYTHON SOURCE LINES 24-29 .. code-block:: Python import openturns as ot import openturns.experimental as otexp import openturns.viewer as otv .. GENERATED FROM PYTHON SOURCE LINES 30-32 We consider a scalar random variable :math:`X`. Its distribution has no importance but is assumed to have a density. .. GENERATED FROM PYTHON SOURCE LINES 32-34 .. code-block:: Python X_dist = ot.Normal(0, 1) .. GENERATED FROM PYTHON SOURCE LINES 35-36 We define the level :math:`\alpha` of the quantile and the confidence level :math:`\beta`. .. GENERATED FROM PYTHON SOURCE LINES 36-39 .. code-block:: Python alpha = 0.95 beta = 0.90 .. GENERATED FROM PYTHON SOURCE LINES 40-42 The exact value of :math:`x_{\alpha}` is known. We compute it in order to compare the estimators to the exact value. .. GENERATED FROM PYTHON SOURCE LINES 42-45 .. code-block:: Python x_alpha_exact = X_dist.computeQuantile(alpha)[0] print('Exact quantile = ', x_alpha_exact) .. rst-class:: sphx-glr-script-out .. code-block:: none Exact quantile = 1.6448536269514726 .. GENERATED FROM PYTHON SOURCE LINES 46-47 We generate a sample of the variable. .. GENERATED FROM PYTHON SOURCE LINES 47-50 .. code-block:: Python n = 501 X_sample = X_dist.getSample(n) .. GENERATED FROM PYTHON SOURCE LINES 51-60 Now, we assume to know :math:`X` only through the generated sample. The empirical estimator of :math:`x_{\alpha}` is defined by the :math:`k_{emp}` -th order statistics, denoted by :math:`X_{(k_{emp})}`, where :math:`k_{emp} = (\lceil \sampleSize \alpha \rceil) -1`. The numerotation :math:`\lceil x \rceil` designs the smallest integer value that is greater than or equal to :math:`x` (we subtract 1 because python numbering starts at 0). The value of this order statistics is computed on the sample. .. GENERATED FROM PYTHON SOURCE LINES 60-65 .. code-block:: Python k_emp = int(n * alpha) empiricalQuantile = X_sample.computeQuantile(alpha) print('Empirical quantile = ', empiricalQuantile) print('Empirical order statistics = ', k_emp) .. rst-class:: sphx-glr-script-out .. code-block:: none Empirical quantile = [1.68279] Empirical order statistics = 475 .. GENERATED FROM PYTHON SOURCE LINES 66-70 Now, we want to get the exact unilateral confidence interval that provides an upper bound of :math:`x_{\alpha}` with the confidence :math:`\beta`, which means that :math:`\Prob{X_{(k_{up})} \geq x_{\alpha}} \geq \beta`. This can be done using :class:`~openturns.experimental.QuantileConfidence`. We can get the order :math:`k_{up}` of the useful statistics. Care that the enumeration begins at 0. .. GENERATED FROM PYTHON SOURCE LINES 70-73 .. code-block:: Python quantConf = otexp.QuantileConfidence(alpha, beta) k_up = quantConf.computeUnilateralRank(n) .. GENERATED FROM PYTHON SOURCE LINES 74-75 We can directly get the order statistics :math:`X_{(k_{up})}` evaluated on the sample. .. GENERATED FROM PYTHON SOURCE LINES 75-79 .. code-block:: Python upper_bound = quantConf.computeUnilateralConfidenceInterval(X_sample) print('Upper bound of the quantile = ', upper_bound) print('Upper bound order statistics = ', k_up) .. rst-class:: sphx-glr-script-out .. code-block:: none Upper bound of the quantile = ]-inf (-1.79769e+308), 1.73945] Upper bound order statistics = 482 .. GENERATED FROM PYTHON SOURCE LINES 80-84 To get the exact unilateral confidence interval that provides a lower bound of :math:`x_{\alpha}` with the confidence :math:`\beta`, which means that :math:`\Prob{X_{(k_{low})} \leq x_{\alpha}} \geq \beta`. This can also be done using :class:`~openturns.experimental.QuantileConfidence`. We can get the order :math:`k_{low}` of the useful statistics. Care: the enumeration begins at 0 here! .. GENERATED FROM PYTHON SOURCE LINES 84-86 .. code-block:: Python k_low = quantConf.computeUnilateralRank(n, True) .. GENERATED FROM PYTHON SOURCE LINES 87-88 We can directly get the order statistics :math:`X_{(k_{low})}` evaluated on the sample. .. GENERATED FROM PYTHON SOURCE LINES 88-92 .. code-block:: Python lower_bound = quantConf.computeUnilateralConfidenceInterval(X_sample, True) print('Lower bound of the quantile = ', lower_bound) print('Lower bound order statistics = ', k_low) .. rst-class:: sphx-glr-script-out .. code-block:: none Lower bound of the quantile = [1.49156, (1.79769e+308) +inf[ Lower bound order statistics = 469 .. GENERATED FROM PYTHON SOURCE LINES 93-120 In order to draw the distribution of any order statistics of :math:`X`, we first create the distribution of the :math:`\sampleSize` order statistics of :math:`X`, denoted by :math:`\vect{X}_{(0:\sampleSize-1)} = (X_{(0)},\dots,X_{(\sampleSize-1)})`. This distribution as the following cumulative distribution function: .. math:: F_{\vect{X}_{(0:\sampleSize-1)}}(\vect{x}) = F_{\vect{U}_{(0:\sampleSize-1)}}(F(x_1), \dots, F(x_{\sampleSize}-1)) and its PDF (if defined) by: .. math:: f_{\vect{X}_{(0:\sampleSize-1)}}(\vect{x}) = \sampleSize!\prod_{i=0}^{\sampleSize-1} f(x_i) \,\mathbf{1}_{\cS}(\vect{x}) where :math:`F` and :math:`f` are respectively the cumulative distribution function and probability density function of :math:`X` and :math:`\cS \subset \Rset^\sampleSize` is defined by: .. math:: \cS=\left\{(x_0, \dots, x_{\sampleSize-1}) \in [0,1]^\sampleSize\,|\,0 \leq x_0 \leq \dots \leq x_{\sampleSize-1} \leq 1 \right\}. Thus :math:`\vect{X}_{(0:\sampleSize-1)}` is defined as the joint distribution which marginals are all distributed as :math:`X` and which core is the order statistics distribution of the Uniform(0,1) distribution. The order statistics distribution of the Uniform(0,1) is created using the class :class:`~openturns.UniformOrderStatistics` which only requires to define the dimension of the order statistics distribution. .. GENERATED FROM PYTHON SOURCE LINES 120-123 .. code-block:: Python unif_orderStat = ot.UniformOrderStatistics(n) normal_orderStat = ot.JointDistribution([X_dist] * n, unif_orderStat) .. GENERATED FROM PYTHON SOURCE LINES 124-126 Thus, the order statistics :math:`X_{(k_{emp})}`, :math:`X_{(k_{low})}` and :math:`X_{(k_{up})}` are the marginals :math:`k_{emp}`, :math:`k_{low}` and :math:`k_{up}` of :math:`\vect{X}_{(0:\sampleSize-1)}`. .. GENERATED FROM PYTHON SOURCE LINES 126-130 .. code-block:: Python X_emp = normal_orderStat.getMarginal(k_emp) X_up = normal_orderStat.getMarginal(k_up) X_low = normal_orderStat.getMarginal(k_low) .. GENERATED FROM PYTHON SOURCE LINES 131-132 Now we can draw the pdf. .. GENERATED FROM PYTHON SOURCE LINES 132-139 .. code-block:: Python xMin = x_alpha_exact - 0.5 xMax = x_alpha_exact + 1 nPoints = 1001 g = X_emp.drawPDF(xMin, xMax, nPoints) g.add(X_up.drawPDF(xMin, xMax, nPoints)) g.add(X_low.drawPDF(xMin, xMax, nPoints)) .. GENERATED FROM PYTHON SOURCE LINES 140-141 We add the exact quantile. .. GENERATED FROM PYTHON SOURCE LINES 141-147 .. code-block:: Python line_exactQuant = ot.Curve([[x_alpha_exact, 0.0], [x_alpha_exact, X_emp.computePDF([x_alpha_exact])]]) line_exactQuant.setLineStyle('dashed') line_exactQuant.setLineWidth(2) line_exactQuant.setColor('red') g.add(line_exactQuant) .. GENERATED FROM PYTHON SOURCE LINES 148-158 Before viewing the graph, we want to fill the area that corresponds to the events: .. math:: \left \{ X_{(k_{up})} \leq x_{\alpha} \right \} \\ \left \{ X_{(k_{low})} \geq x_{\alpha} \right \} Each event has the probability :math:`1- \beta`. To do that, we define a function that returns a sample created from a regular grid from xmin to xmax with npoints points. .. GENERATED FROM PYTHON SOURCE LINES 158-167 .. code-block:: Python def linearSample(xmin, xmax, npoints): step = (xmax - xmin) / (npoints - 1) rg = ot.RegularGrid(xmin, step, npoints) vertices = rg.getVertices() return vertices .. GENERATED FROM PYTHON SOURCE LINES 168-172 Then we use it to define some :class:`~openturns.Polygon` that will be filled. These polygons define the area under the pdf line. Thus, the lower vertical bound is 0 and the upper vertical bound is the PDF line. First the area for the upper bound. .. GENERATED FROM PYTHON SOURCE LINES 172-182 .. code-block:: Python nplot = 100 x_list = linearSample(xMin, x_alpha_exact, nplot) y_list = X_up.computePDF(x_list) vLow = [0.0] * nplot vUp = [y_list[i, 0] for i in range(nplot)] boundsPoly_upper = ot.Polygon.FillBetween(x_list.asPoint(), vLow, vUp) boundsPoly_upper.setColor(g.getColors()[1]) g.add(boundsPoly_upper) .. GENERATED FROM PYTHON SOURCE LINES 183-184 Then the area for the lower bound. .. GENERATED FROM PYTHON SOURCE LINES 184-193 .. code-block:: Python nplot = 100 x_list = linearSample(x_alpha_exact, xMax, nplot) y_list = X_low.computePDF(x_list) vLow = [0.0] * nplot vUp = [y_list[i, 0] for i in range(nplot)] boundsPoly_lower = ot.Polygon.FillBetween(x_list.asPoint(), vLow, vUp) boundsPoly_lower.setColor(g.getColors()[2]) g.add(boundsPoly_lower) .. GENERATED FROM PYTHON SOURCE LINES 194-197 .. code-block:: Python g.setLegends([r'$X_{emp}$', r'$X_{up} (\beta = $' + str(beta) + ')', r'$X_{low} (\beta = $' + str(beta) + ')', 'exact quantile']) g.setLegendPosition('topright') .. GENERATED FROM PYTHON SOURCE LINES 198-202 .. code-block:: Python g.setTitle('Quantile estimation and confidence intervals') g.setXTitle('x') view = otv.View(g) .. image-sg:: /auto_data_analysis/sample_analysis/images/sphx_glr_plot_quantile_confidence_exact_empirical_001.svg :alt: Quantile estimation and confidence intervals :srcset: /auto_data_analysis/sample_analysis/images/sphx_glr_plot_quantile_confidence_exact_empirical_001.svg :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 203-204 Display all the figures. .. GENERATED FROM PYTHON SOURCE LINES 204-207 .. code-block:: Python otv.View.ShowAll() .. GENERATED FROM PYTHON SOURCE LINES 208-214 Now we can conclude that: * the empirical estimator is centered on the true quantile, * the :math:`X_{(k_{up})}` estimator has a probability :math:`\beta` to be above the true quantile, * the :math:`X_{(k_{low})}` estimator has a probability :math:`\beta` to be under the true quantile. .. _sphx_glr_download_auto_data_analysis_sample_analysis_plot_quantile_confidence_exact_empirical.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_quantile_confidence_exact_empirical.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_quantile_confidence_exact_empirical.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_quantile_confidence_exact_empirical.zip `