.. _quantile_confidence_estimation:

Estimation of a quantile bound
------------------------------

We consider a random variable :math:`X` of dimension 1 and the unknown quantile :math:`x_{\alpha}`
of level :math:`\alpha` of its distribution (:math:`\alpha \in [0, 1]`).
We seek to evaluate an upper bound of :math:`x_{\alpha}` with a confidence greater or equal to
:math:`\beta`, using order statistics.

Let :math:`(X_1, \dots, X_\sampleSize)` be some independent copies of :math:`X`.
Let :math:`X_{(k)}` be the :math:`k` -th order statistics of :math:`(X_1, \dots, X_\sampleSize)` which means that
:math:`X_{(k)}` is the :math:`k` -th minimum of :math:`(X_1, \dots, X_\sampleSize)` for :math:`1 \leq k \leq \sampleSize`. For
example, :math:`X_{(1)} = \min (X_1, \dots, X_\sampleSize)` is the minimum
and :math:`X_{(\sampleSize)} = \max (X_1, \dots, X_\sampleSize)` is the maximum. We have:

.. math::

    X_{(1)} \leq X_{(2)} \leq \dots \leq X_{(\sampleSize)}

The probability density and cumulative distribution functions of the order
statistics :math:`X_{(k)}` are:

.. math::
    :label: DistOrderStat

    F_{X_{(k)}}(x) & = \sum_{i=k}^{\sampleSize} \binom{\sampleSize}{i}\left(F(x)
    \right)^i \left(1-F(x)
    \right)^{\sampleSize-i} \\
    p_{X_{(k)}}(x) & = (\sampleSize-k+1)\binom{\sampleSize}{k-1}\left(F(x)\right)^{k-1}
    \left(1-F(x)
    \right)^{\sampleSize-k} p(x)

We notice that :math:`F_{X_{(k)}}(x) = \overline{F}_{(\sampleSize,F(x))}(k-1)` where
:math:`F_{(\sampleSize,F(x))}` is the cumulated
distribution function of the Binomial distribution :math:`\cB(\sampleSize,F(x))` and
:math:`\overline{F}_{(\sampleSize,F(x))}(k) = 1 - F_{(\sampleSize,F(x))}(k)` is the
complementary cumulated distribution fonction (also named survival function in dimension
1).
Therefore:

.. math::

    F_{X_{(k)}}(x_{\alpha}) = \sum_{i=k}^{\sampleSize} \binom{\sampleSize}{i} \alpha^i (1-\alpha)^{\sampleSize-i}
    = \overline{F}_{(\sampleSize,\alpha)}(k-1)

Rank for an upper bound of the quantile
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let :math:`(x_1, \dots, x_\sampleSize)` be an i.i.d. sample of size :math:`\sampleSize` of
the random variable :math:`X`.
Given a quantile level :math:`\alpha \in [0,1]`, a confidence level
:math:`\beta \in [0,1]`, and a sample size :math:`\sampleSize`, we seek the smallest
rank :math:`k \in \llbracket 1, \sampleSize \rrbracket` such that:

.. math::
    :label: EqOrderStatB

    \Prob{x_{\alpha} \leq X_{(k)}} \geq \beta \qquad

As equation :eq:`EqOrderStatB` implies:

.. math::
    :label: EqOrderStat2B

    1-F_{X_{(k)}}(x_{\alpha})\geq \beta

This implies:

.. math::

    F_{\sampleSize, \alpha}(k-1)\geq \beta

The smallest rank :math:`k_{sol}` such that the previous equation is satisfied is:

.. math::

    k_{sol} & = \min \{ k \in \llbracket 1, n \rrbracket \, | \, F_{\sampleSize, \alpha}(k-1)\geq \beta \}\\
            & = 1 +  \min \{ k \in \llbracket 1, n\rrbracket \, | \, F_{\sampleSize, \alpha}(k)\geq \beta \}

An upper bound of :math:`x_{\alpha}` is estimated by the value of :math:`X_{(k_{sol})}`
on the sample :math:`(x_1, \dots, x_\sampleSize)`.

Here is a recap of the existence of solutions for this case:

+------------------------+------------------+-------------------------------------+---------------------------------+
| :math:`K_{sol}`        | :math:`\beta=0`  | :math:`0 < \beta < 1`               | :math:`\beta=1`                 |
+========================+==================+=====================================+=================================+
| :math:`\alpha=0`       | 1                | 1                                   | 1                               |
+------------------------+------------------+-------------------------------------+---------------------------------+
| :math:`0 < \alpha < 1` | 1                | see :eq:`EqOrderStatBgen`           | :math:`\emptyset`               |
+------------------------+------------------+-------------------------------------+---------------------------------+
| :math:`\alpha=1`       | 1                | :math:`\emptyset`                   | :math:`\emptyset`               |
+------------------------+------------------+-------------------------------------+---------------------------------+

With:

.. math::
    :label: EqOrderStatBgen

    1+F_{n,\alpha}^{-1}(\beta) \text{if} 1-\alpha^n \geq \beta \text{else} \emptyset

Rank for a lower bound of the quantile
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Similarly for the lower bound we seek the greatest rank :math:`k \in \llbracket 1, \sampleSize \rrbracket` such that:

.. math::
    :label: EqOrderStatA

    \Prob{X_{(k)} \leq x_{\alpha}} \geq \beta \qquad

Here is a recap of the existence of solutions for this case:

+------------------------+--------------------------------------+-----------------------------------------+---------------------------------+
| :math:`K_{sol}`        | :math:`\beta=0`                      | :math:`0 < \beta < 1`                   | :math:`\beta=1`                 |
+========================+======================================+=========================================+=================================+
| :math:`\alpha=0`       | n                                    | :math:`\emptyset`                       | :math:`\emptyset`               |
+------------------------+--------------------------------------+-----------------------------------------+---------------------------------+
| :math:`0 < \alpha < 1` | n                                    | see :eq:`EqOrderStatAgen`               | :math:`\emptyset`               |
+------------------------+--------------------------------------+-----------------------------------------+---------------------------------+
| :math:`\alpha=1`       | n                                    | n                                       | n                               |
+------------------------+--------------------------------------+-----------------------------------------+---------------------------------+

With

.. math::
    :label: EqOrderStatAgen

    \emptyset \text{if} (1-\alpha)^n > 1 - \beta \\
    \text{otherwise if there exists} k_0 | 1-\beta = F_{(\sampleSize,\alpha}(k_0 - 1) \text{then} k_{sol} = 1+F_{n,\alpha}^{-1}(1-\beta)
    \text{and if not} k_{sol} = F_{n,\alpha}^{-1}(1-\beta)

Ranks for bilateral bounds of the quantile
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Similarly for the lower bound we seek the ranks :math:`k_1, k_2 \in \llbracket 1, \sampleSize \rrbracket^2` such that:

.. math::
    :label: EqOrderStatC

    \Prob{X_{(k_1)} \leq x_{\alpha} \leq X_{(k_2)}} \geq \beta \qquad

with :math:`k_2 - k_1` the smallest.

Here is a recap of the existence of solutions for this case:

+------------------------+-------------------------------------------------+---------------------------------+-------------------------+
| :math:`K_{sol}`        | :math:`\beta=0`                                 | :math:`0 < \beta < 1`           | :math:`\beta=1`         |
+========================+=================================================+=================================+=========================+
| :math:`\alpha=0`       | :math:`\Bigl\lfloor \frac{n}{2} \Bigr\rfloor`   | :math:`\emptyset`               | :math:`\emptyset`       |
+------------------------+-------------------------------------------------+---------------------------------+-------------------------+
| :math:`0 < \alpha < 1` | 1                                               | :math:`\emptyset` or 1          | :math:`\emptyset`       |
+------------------------+-------------------------------------------------+---------------------------------+-------------------------+
| :math:`\alpha=1`       | :math:`\Bigl\lfloor \frac{n}{2} \Bigr\rfloor`   | :math:`\emptyset`               | :math:`\emptyset`       |
+------------------------+-------------------------------------------------+---------------------------------+-------------------------+

Minimum sample size for an upper bound of the quantile
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Given :math:`\alpha`, :math:`\beta`, and order :math:`k`, we seek for the smallest sample size
:math:`\sampleSize`
such that the equation :eq:`EqOrderStatB` is satisfied. In order to do so, we solve the
equation :eq:`EqOrderStat2B` with respect to the sample size :math:`\sampleSize`.

Once the smallest size :math:`\sampleSize` has been estimated, a sample of size :math:`\sampleSize` can be
generated from :math:`X` and an upper bound of :math:`x_{\alpha}` is estimated using
:math:`x_{(k)}` i.e. the :math:`k`-th observation
in the ordered sample :math:`(x_{(1)}, \dots, x_{(\sampleSize)})`.

Here is a recap of the existence of solutions for this case:

+--------------------------------+--------------------------------------+-----------------------------------------+---------------------------------+
|                                | :math:`\beta=0`                      | :math:`0 < \beta < 1`                   | :math:`\beta=1`                 |
+================================+======================================+=========================================+=================================+
| :math:`0 \leq \alpha \leq 1`   | :math:`k \text{if} 1-\alpha^k \geq \beta \text{else} \emptyset`                                                  |
+--------------------------------+------------------------------------------------------------------------------------------------------------------+

Minimum sample size for a lower bound of the quantile
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Similarly for the lower bound, we seek for the smallest sample size :math:`\sampleSize`
such that the equation :eq:`EqOrderStatA` is satisfied.

Here is a recap of the existence of solutions for this case:

+------------------------+--------------------------------------+-----------------------------------------+---------------------------------+
|                        | :math:`\beta=0`                      | :math:`0 < \beta < 1`                   | :math:`\beta=1`                 |
+========================+======================================+=========================================+=================================+
| :math:`\alpha=0`       | :math:`k`                            | :math:`\emptyset`                       | :math:`\emptyset`               |
+------------------------+--------------------------------------+-----------------------------------------+---------------------------------+
| :math:`0 < \alpha < 1` | :math:`\argmin \{n \geq k | f_{k,\alpha} \leq 1-\beta \}`                      | :math:`\emptyset`               |
+------------------------+--------------------------------------+-----------------------------------------+---------------------------------+
| :math:`\alpha=1`       | :math:`k`                            | :math:`k`                               | :math:`k`                       |
+------------------------+--------------------------------------+-----------------------------------------+---------------------------------+


Minimum sample size for bilateral bounds of the quantile
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Similarly for the bilateral bounds, we seek for the smallest sample size :math:`\sampleSize`
such that the equation :eq:`EqOrderStatC` is satisfied.

Here is a recap of the existence of solutions for this case:

+------------------------+--------------------------------------+-----------------------------------------+---------------------------------+
|                        | :math:`\beta=0`                      | :math:`0 < \beta < 1`                   | :math:`\beta=1`                 |
+========================+======================================+=========================================+=================================+
| :math:`\alpha=1`       | :math:`k_2`                          | :math:`\emptyset`                       | :math:`\emptyset`               |
+------------------------+--------------------------------------+-----------------------------------------+---------------------------------+
| :math:`0 < \alpha < 1` | :math:`k_2` if :math:`1-\alpha^{k_2} - F_{k_2,\alpha}(k_1-1) \geq \beta` else :math:`\emptyset`                  |
+------------------------+--------------------------------------------------------------------------------+---------------------------------+
| :math:`\alpha=0`       | :math:`\emptyset` if :math:`k_1 \neq 0` and :math:`\beta > 0` else :math:`k_2`                                   |
+------------------------+------------------------------------------------------------------------------------------------------------------+


.. topic:: API:

    - See :class:`~openturns.experimental.QuantileConfidence`

.. topic:: Examples:

    - See :doc:`/auto_data_analysis/manage_data_and_samples/plot_quantile_confidence_estimation`
    - See :doc:`/auto_data_analysis/manage_data_and_samples/plot_quantile_confidence_chemical_process`

.. topic:: References:

    - [meeker2017]_
    - Wilks, S. S. (1941). Determination of sample sizes for setting tolerance limits. The Annals of Mathematical Statistics, 12(1), 91-96
    - Robert C.P., Casella G. (2004). Monte-Carlo Statistical Methods, Springer, ISBN 0-387-21239-6, 2nd ed.
    - Rubinstein R.Y. (1981). Simulation and The Monte-Carlo methods, John Wiley & Sons