Exact quantile confidence interval based on order statistics¶

We consider a random variable $X$ of dimension 1 and its quantile $x_{\alpha}$ of level $\alpha$ ( $\alpha \in [0, 1]$ ). We seek to evaluate an upper bound of $x_{\alpha}$ with a confidence greater or equal to $\beta$ , using order statistics.

Let $(X_1, \dots, X_\sampleSize)$ be some independent copies of $X$ . Let $X_{(k)}$ be the $k$ -th order statistics of $(X_1, \dots, X_\sampleSize)$ which means that $X_{(k)}$ is the $k$ -th minimum of $(X_1, \dots, X_\sampleSize)$ for $1 \leq k \leq \sampleSize$ . For example, $X_{(1)} = \min (X_1, \dots, X_\sampleSize)$ is the minimum and $X_{(\sampleSize)} = \max (X_1, \dots, X_\sampleSize)$ is the maximum. We have:

$X_{(1)} \leq X_{(2)} \leq \dots \leq X_{(\sampleSize)}$

The probability density and cumulative distribution functions of the order statistics $X_{(k)}$ are:

(1)¶ $F_{X_{(k)}}(x) & = \sum_{i=k}^{\sampleSize} \binom{\sampleSize}{i}\left(F(x) \right)^i \left(1-F(x) \right)^{\sampleSize-i} \\ p_{X_{(k)}}(x) & = (\sampleSize-k+1)\binom{\sampleSize}{k-1}\left(F(x)\right)^{k-1} \left(1-F(x) \right)^{\sampleSize-k} p(x)$

We notice that $F_{X_{(k)}}(x) = \overline{F}_{(\sampleSize,F(x))}(k-1)$ where $F_{(\sampleSize,F(x))}$ is the cumulative distribution function of the Binomial distribution $\cB(\sampleSize,F(x))$ and $\overline{F}_{(\sampleSize,F(x))}(k) = 1 - F_{(\sampleSize,F(x))}(k)$ is the complementary cumulative distribution fonction (also named survival function in dimension 1). Therefore:

$F_{X_{(k)}}(x_{\alpha}) = \sum_{i=k}^{\sampleSize} \binom{\sampleSize}{i} \alpha^i (1-\alpha)^{\sampleSize-i} = \overline{F}_{(\sampleSize,\alpha)}(k-1)$

Rank for an upper bound of the quantile¶

Let $(x_1, \dots, x_\sampleSize)$ be an i.i.d. sample of size $\sampleSize$ of the random variable $X$ . Given a quantile level $\alpha \in [0,1]$ , a confidence level $\beta \in [0,1]$ , and a sample size $\sampleSize$ , we seek the smallest rank $k \in \llbracket 1, \sampleSize \rrbracket$ such that:

(2)¶ $\Prob{x_{\alpha} \leq X_{(k)}} \geq \beta$

As equation (2) can be written as:

(3)¶ $1-F_{X_{(k)}}(x_{\alpha})\geq \beta$

or even as:

$F_{\sampleSize, \alpha}(k-1)\geq \beta$

Then, the smallest rank $k_{sol}$ such that the previous equation is satisfied is:

$k_{sol} & = \min \{ k \in \llbracket 1, \sampleSize \rrbracket \, | \, F_{\sampleSize, \alpha}(k-1)\geq \beta \}\\ & = 1 + \min \{ k \in \llbracket 1, \sampleSize\rrbracket \, | \, F_{\sampleSize, \alpha}(k)\geq \beta \}$

An upper bound of $x_{\alpha}$ is estimated by the value of $X_{(k_{sol})}$ on the sample $(x_1, \dots, x_\sampleSize)$ .

Here is a recap of the existence of solutions for this case:

$k_{sol}$	$\beta=0$	$0 < \beta < 1$	$\beta=1$
$\alpha=0$	1	1	1
$0 < \alpha < 1$	1	see (4)	$\emptyset$
$\alpha=1$	1	$\emptyset$	$\emptyset$

where:

(4)¶ $k_{sol} & = 1 + F_{\sampleSize,\alpha}^{-1}(\beta) \quad \text{if} \quad 1-\alpha^\sampleSize \geq \beta \\ & = \emptyset \quad \text{else}$

Rank for a lower bound of the quantile¶

Given the same data as previoulsy, we seek the greatest rank $k \in \llbracket 1, \sampleSize \rrbracket$ such that:

(5)¶ $\Prob{X_{(k)} \leq x_{\alpha}} \geq \beta$

which can be written as:

(6)¶ $F_{X_{(k)}}(x_{\alpha})\geq \beta$

and finally as:

$F_{\sampleSize, \alpha}(k - 1)\leq 1 - \beta$

Then, the greatest rank $k_{sol}$ such that the previous equation is satisfied is:

$k_{sol} & = \max \{ k \in \llbracket 1, \sampleSize \rrbracket \, | \, F_{\sampleSize, \alpha}(k-1)\leq \beta \}\\ & = 1+\max \{ k \in \llbracket 1, \sampleSize\rrbracket \, | \, F_{\sampleSize, \alpha}(k)\leq \beta \}$

Here is a recap of the existence of solutions for this case:

$k_{sol}$	$\beta=0$	$0 < \beta < 1$	$\beta=1$
$\alpha=0$	$\sampleSize$	$\emptyset$	$\emptyset$
$0 < \alpha < 1$	$\sampleSize$	see (7)	$\emptyset$
$\alpha=1$	$\sampleSize$	$\sampleSize$	$\sampleSize$

where:

(7)¶ $k_{sol} & = \emptyset \quad \text{if} \quad (1-\alpha)^\sampleSize > 1 - \beta \\ & = 1 + F_{\sampleSize,\alpha}^{-1}(1-\beta) \quad \text{otherwise and if} \quad \exists k_0 \, | \, 1-\beta = F_{(\sampleSize,\alpha}(k_0 - 1) \\ & = F_{\sampleSize,\alpha}^{-1}(1-\beta) \quad \text{else}$

Ranks for bilateral bounds of the quantile¶

Given the same data as previoulsy, we can seek the ranks $k_1, k_2 \in \llbracket 1, \sampleSize \rrbracket^2$ as solution of different problems.

The problem can be:

(8)¶ $\begin{array}{ll} (k_1, k_2) = & \argmin \Prob{X_{(k_1)} \leq x_{\alpha} \leq X_{(k_2)}}\\ & \mbox{s.t.} \Prob{X_{(k_1)} \leq x_{\alpha} \leq X_{(k_2)}} \geq \beta \end{array}$

or:

(9)¶ $\begin{array}{ll} (k_1, k_2) = & \argmin (k_2-k_1)\\ & \mbox{s.t.} \Prob{X_{(k_1)} \leq x_{\alpha} \leq X_{(k_2)}} \geq \beta \end{array}$

or:

(10)¶ $\Prob{X_{(k_1)} \leq x_{\alpha} } \geq 1 - \dfrac{1-\beta}{2}\\ \Prob{x_{\alpha} \leq X_{(k_2)}} \geq 1 - \dfrac{1-\beta}{2}$

or with $(k_1, k_2) = (k,\sampleSize-k+1)$ and $1 \leq k \leq \sampleSize$ the greatest rank such that:

(11)¶ $\Prob{X_{(k)} \leq x_{\alpha} \leq X_{(\sampleSize-k+1)}} \geq \beta.$

The solutions of (8) and (9) are determined numerically, using an optimization algorithm.

The solutions of (10) are respectively defined by:

$\overline{F}_{(\sampleSize,\alpha)}(k_1 - 1) \leq \dfrac{1 - \beta}{2} \\ F_{(\sampleSize,\alpha)}(k_2 - 1) \geq 1-\dfrac{1 - \beta}{2}$

which leads to the respective solutions:

$k_{1, sol} & = \max \{ k \in \llbracket 1, \sampleSize \rrbracket \, | \, F_{\sampleSize, \alpha}(k-1)\leq \dfrac{1 - \beta}{2} \}\\ & = 1 + \max \{ k \in \llbracket 1, \sampleSize\rrbracket \, | \, F_{\sampleSize, \alpha}(k)\leq \dfrac{1-\beta}{2} \}$

and

$k_{2, sol} & = \min \{ k \in \llbracket 1, \sampleSize \rrbracket \, | \, F_{\sampleSize, \alpha}(k - 1)\geq 1- \dfrac{1 - \beta}{2} \}\\ & = 1 + \min \{ k \in \llbracket 1, \sampleSize\rrbracket \, | \, F_{\sampleSize, \alpha}(k)\geq 1 - \dfrac{1 - \beta}{2} \}$

Then, the previous tables written for the lower and upper bounds can be used to find $k_{1, sol}$ and $k_{2, sol}$ respectively with $\beta \rightarrow \dfrac{1-\beta}{2}$ or $\beta \rightarrow 1-\dfrac{1-\beta}{2}$ .

The solutions of (11) are gathered here:

$k_{sol}$	$\beta=0$	$0 < \beta < 1$	$\beta=1$
$\alpha=0$	$\Bigl\lfloor \frac{n}{2} \Bigr\rfloor$	$\emptyset$	$\emptyset$
$0 < \alpha < 1$	1	$\emptyset$ or 1	$\emptyset$
$\alpha=1$	$\Bigl\lfloor \frac{\sampleSize}{2} \Bigr\rfloor$ \| $\emptyset$ \| $\emptyset$

Minimum sample size for an upper bound of the quantile¶

Given $\alpha$ , $\beta$ , and the rank $1 \leq k \leq \sampleSize$ , we seek the smallest sample size $\sampleSize$ such that:

(12)¶ $\Prob{x_{\alpha} \leq X_{(\sampleSize-k+1)}} \geq \beta$

As equation (12) can be written as:

(13)¶ $1-F_{X_{(\sampleSize-k+1)}}(x_{\alpha})\geq \beta$

or even as:

$F_{\sampleSize, \alpha}(\sampleSize-k)\geq \beta$

Note that the problem is defined differently than in equation (2). In order to do so, we solve equation (13) with respect to the sample size $\sampleSize$ . We use an optimization algorithm to determined $n_{sol}$ in the interval $\llbracket k, +\infty \llbracket$ . We can reduce the research interval to the interval $\llbracket k, n_2 \rrbracket$ where $n_2$ is a size that verifies equation (13). It can be determined using the approximation of the binomial distribution by the normal distribution with the same mean and variance.

Once the smallest size $\sampleSize$ has been estimated, a sample of size $\sampleSize$ can be generated from $X$ and an upper bound of $x_{\alpha}$ is estimated using $x_{(n-k+1)}$ i.e. the $k$ -th observation in the decreasing ordered sample $(x_{(\sampleSize)}, \dots, x_{(1)})$ .

Minimum sample size for a lower bound of the quantile¶

Given the same data as previoulsy, we seek the smallest sample size $\sampleSize$ such that equation (5) is satisfied.

Here is a recap of the existence of solutions for this case:

$n_{sol}$	$\beta=0$	$0 < \beta < 1$	$\beta=1$
$\alpha=0$	$k$	$\emptyset$	$\emptyset$
$0 < \alpha < 1$	$\argmin \{\sampleSize \geq k \| F_{\sampleSize,\alpha}(k-1) \leq 1-\beta \}$		$\emptyset$
$\alpha=1$	$k$	$k$	$k$

Minimum sample size for bilateral bounds of the quantile¶

Given two order statistics $(k_1, k_2)$ with $1 \leq k_1 < k_2 \leq \sampleSize$ , we seek the smallest sample size $\sampleSize$ such that:

(14)¶ $\Prob{X_{(k_1)} \leq x_{\alpha} \leq X_{(n-k_2+1)}} \geq \beta$

As equation (14) can be written as:

(15)¶ $F_{X_{(\sampleSize-k_2+1)}}(x_{\alpha}) - F_{X_{(k_1)}}(x_{\alpha}) \geq \beta$

or even as:

$F_{\sampleSize, \alpha}(\sampleSize-k_2) - F_{\sampleSize, \alpha}(k_1-1)\geq \beta$

Note that the problem is defined differently than in equation (9). In order to do so, we solve equation (15) with respect to the sample size $\sampleSize$ . We use an optimization algorithm to determined $n_{sol}$ in the interval $\llbracket k, +\infty \llbracket$ . We can reduce the research interval to the interval $\llbracket k, n_2 \rrbracket$ where $n_2$ is a size that verifies equation (13). It can be determined using the approximation of the binomial distribution by the normal distribution with the same mean and variance.

Once the smallest size $\sampleSize$ has been estimated, a sample of size $\sampleSize$ can be generated from $X$ and an lower and upper bound of $x_{\alpha}$ is estimated using $x_{(k_1)}$ and $x_{(n-k_2+1)}$ i.e. the $k_1$ -th observation in the ordered sample $(x_{(1)}, \dots, x_{(\sampleSize)})$ and the $\sampleSize-k_2$ -th observation in the decreasing ordered sample $(x_{(\sampleSize)}, \dots, x_{(1)})$ .

In the particular case where $(k_1, k_2) = (1,1)$ , we seek the smallest sample size $\sampleSize$ such that:

$\Prob{ \min (X_1, \dots, X_\sampleSize) \leq x_{\alpha} \leq \max (X_1, \dots, X_\sampleSize)} \geq \beta.$

Then, equantion (15) can be written as:

$1-\alpha^\sampleSize - (1-\alpha)^\sampleSize \geq \beta.$

The optimal $\sampleSize$ is determined using an optimization algorithm which research is reduced to the interval:

$\left \lfloor \dfrac{\log (1-\beta)}{\log \gamma} \right \rfloor \leq n \leq \left \lfloor \dfrac{\log \left(\dfrac{1-\beta}{2}\right)}{\log \gamma} \right \rfloor$

where $\gamma = \max(\alpha, 1-\alpha)$ .

OpenTURNS

An Open source initiative for the Treatment of Uncertainties, Risks'N Statistics

Table of Contents

Previous topic

Next topic

This Page

Exact quantile confidence interval based on order statistics¶

Rank for an upper bound of the quantile¶

Rank for a lower bound of the quantile¶

Ranks for bilateral bounds of the quantile¶

Minimum sample size for an upper bound of the quantile¶

Minimum sample size for a lower bound of the quantile¶

Minimum sample size for bilateral bounds of the quantile¶