Exact quantile confidence interval based on order statisticsΒΆ

We consider a random variable X of dimension 1 and its quantile x_{\alpha} of level \alpha (\alpha \in [0, 1]). We seek to evaluate an upper bound of x_{\alpha} with a confidence greater or equal to \beta, using order statistics.

Let (X_1, \dots, X_\sampleSize) be some independent copies of X. Let X_{(k)} be the k -th order statistics of (X_1, \dots, X_\sampleSize) which means that X_{(k)} is the k -th minimum of (X_1, \dots, X_\sampleSize) for 1 \leq k \leq \sampleSize. For example, X_{(1)} = \min (X_1, \dots, X_\sampleSize) is the minimum and X_{(\sampleSize)} = \max (X_1, \dots, X_\sampleSize) is the maximum. We have:

X_{(1)} \leq X_{(2)} \leq \dots \leq X_{(\sampleSize)}

The probability density and cumulative distribution functions of the order statistics X_{(k)} are:

(1)ΒΆF_{X_{(k)}}(x) & = \sum_{i=k}^{\sampleSize} \binom{\sampleSize}{i}\left(F(x)
\right)^i \left(1-F(x)
\right)^{\sampleSize-i} \\
p_{X_{(k)}}(x) & = (\sampleSize-k+1)\binom{\sampleSize}{k-1}\left(F(x)\right)^{k-1}
\left(1-F(x)
\right)^{\sampleSize-k} p(x)

We notice that F_{X_{(k)}}(x) = \overline{F}_{(\sampleSize,F(x))}(k-1) where F_{(\sampleSize,F(x))} is the cumulative distribution function of the Binomial distribution \cB(\sampleSize,F(x)) and \overline{F}_{(\sampleSize,F(x))}(k) = 1 - F_{(\sampleSize,F(x))}(k) is the complementary cumulative distribution fonction (also named survival function in dimension 1). Therefore:

F_{X_{(k)}}(x_{\alpha}) = \sum_{i=k}^{\sampleSize} \binom{\sampleSize}{i} \alpha^i (1-\alpha)^{\sampleSize-i}
= \overline{F}_{(\sampleSize,\alpha)}(k-1)

Rank for an upper bound of the quantileΒΆ

Let (x_1, \dots, x_\sampleSize) be an i.i.d. sample of size \sampleSize of the random variable X. Given a quantile level \alpha \in [0,1], a confidence level \beta \in [0,1], and a sample size \sampleSize, we seek the smallest rank k \in \llbracket 1, \sampleSize \rrbracket such that:

(2)ΒΆ\Prob{x_{\alpha} \leq X_{(k)}} \geq \beta

As equation (2) can be written as:

(3)ΒΆ1-F_{X_{(k)}}(x_{\alpha})\geq \beta

or even as:

F_{\sampleSize, \alpha}(k-1)\geq \beta

Then, the smallest rank k_{sol} such that the previous equation is satisfied is:

k_{sol} & = \min \{ k \in \llbracket 1, \sampleSize \rrbracket \, | \, F_{\sampleSize, \alpha}(k-1)\geq \beta \}\\
        & = 1 + \min \{ k \in \llbracket 1, \sampleSize\rrbracket \, | \, F_{\sampleSize, \alpha}(k)\geq \beta \}

An upper bound of x_{\alpha} is estimated by the value of X_{(k_{sol})} on the sample (x_1, \dots, x_\sampleSize).

Here is a recap of the existence of solutions for this case:

k_{sol}

\beta=0

0 < \beta < 1

\beta=1

\alpha=0

1

1

1

0 < \alpha < 1

1

see (4)

\emptyset

\alpha=1

1

\emptyset

\emptyset

where:

(4)ΒΆk_{sol} & = 1 + F_{\sampleSize,\alpha}^{-1}(\beta)   \quad \text{if} \quad  1-\alpha^\sampleSize \geq \beta \\
        & = \emptyset   \quad \text{else}

Rank for a lower bound of the quantileΒΆ

Given the same data as previoulsy, we seek the greatest rank k \in \llbracket 1, \sampleSize \rrbracket such that:

(5)ΒΆ\Prob{X_{(k)} \leq x_{\alpha}} \geq \beta

which can be written as:

(6)ΒΆF_{X_{(k)}}(x_{\alpha})\geq \beta

and finally as:

F_{\sampleSize, \alpha}(k - 1)\leq 1 - \beta

Then, the greatest rank k_{sol} such that the previous equation is satisfied is:

k_{sol} & = \max \{ k \in \llbracket 1, \sampleSize \rrbracket \, | \, F_{\sampleSize, \alpha}(k-1)\leq \beta \}\\
        & = 1+\max \{ k \in \llbracket 1, \sampleSize\rrbracket \, | \, F_{\sampleSize, \alpha}(k)\leq \beta \}

Here is a recap of the existence of solutions for this case:

k_{sol}

\beta=0

0 < \beta < 1

\beta=1

\alpha=0

\sampleSize

\emptyset

\emptyset

0 < \alpha < 1

\sampleSize

see (7)

\emptyset

\alpha=1

\sampleSize

\sampleSize

\sampleSize

where:

(7)ΒΆk_{sol} & = \emptyset \quad \text{if} \quad  (1-\alpha)^\sampleSize > 1 - \beta \\
        & = 1 + F_{\sampleSize,\alpha}^{-1}(1-\beta)  \quad  \text{otherwise and if} \quad  \exists k_0 \, | \, 1-\beta = F_{(\sampleSize,\alpha}(k_0 - 1) \\
        & = F_{\sampleSize,\alpha}^{-1}(1-\beta)  \quad  \text{else}

Ranks for bilateral bounds of the quantileΒΆ

Given the same data as previoulsy, we can seek the ranks k_1, k_2 \in \llbracket 1, \sampleSize \rrbracket^2 as solution of different problems.

The problem can be:

(8)ΒΆ\begin{array}{ll}
(k_1, k_2) = & \argmin \Prob{X_{(k_1)} \leq x_{\alpha} \leq X_{(k_2)}}\\
             & \mbox{s.t.} \Prob{X_{(k_1)} \leq x_{\alpha} \leq X_{(k_2)}} \geq \beta
\end{array}

or:

(9)ΒΆ\begin{array}{ll}
(k_1, k_2) = & \argmin (k_2-k_1)\\
             & \mbox{s.t.} \Prob{X_{(k_1)} \leq x_{\alpha} \leq X_{(k_2)}} \geq \beta
\end{array}

or:

(10)ΒΆ\Prob{X_{(k_1)} \leq x_{\alpha} } \geq 1 - \dfrac{1-\beta}{2}\\
\Prob{x_{\alpha} \leq X_{(k_2)}} \geq 1 - \dfrac{1-\beta}{2}

or with (k_1, k_2) = (k,\sampleSize-k+1) and 1 \leq k \leq \sampleSize the greatest rank such that:

(11)ΒΆ\Prob{X_{(k)} \leq x_{\alpha} \leq X_{(\sampleSize-k+1)}} \geq \beta.

The solutions of (8) and (9) are determined numerically, using an optimization algorithm.

The solutions of (10) are respectively defined by:

\overline{F}_{(\sampleSize,\alpha)}(k_1 - 1) \leq \dfrac{1 - \beta}{2} \\
F_{(\sampleSize,\alpha)}(k_2 - 1) \geq 1-\dfrac{1 - \beta}{2}

which leads to the respective solutions:

k_{1, sol} & = \max \{ k \in \llbracket 1, \sampleSize \rrbracket \, | \, F_{\sampleSize, \alpha}(k-1)\leq \dfrac{1 - \beta}{2} \}\\
        & = 1 + \max \{ k \in \llbracket 1, \sampleSize\rrbracket \, | \, F_{\sampleSize, \alpha}(k)\leq \dfrac{1-\beta}{2} \}

and

k_{2, sol} & = \min \{ k \in \llbracket 1, \sampleSize \rrbracket \, | \, F_{\sampleSize, \alpha}(k - 1)\geq 1- \dfrac{1 - \beta}{2} \}\\
        & = 1 + \min \{ k \in \llbracket 1, \sampleSize\rrbracket \, | \, F_{\sampleSize, \alpha}(k)\geq 1 - \dfrac{1 - \beta}{2} \}

Then, the previous tables written for the lower and upper bounds can be used to find k_{1, sol} and k_{2, sol} respectively with \beta \rightarrow \dfrac{1-\beta}{2} or \beta \rightarrow 1-\dfrac{1-\beta}{2}.

The solutions of (11) are gathered here:

k_{sol}

\beta=0

0 < \beta < 1

\beta=1

\alpha=0

\Bigl\lfloor \frac{n}{2} \Bigr\rfloor

\emptyset

\emptyset

0 < \alpha < 1

1

\emptyset or 1

\emptyset

\alpha=1

\Bigl\lfloor \frac{\sampleSize}{2} \Bigr\rfloor | \emptyset | \emptyset

Minimum sample size for an upper bound of the quantileΒΆ

Given \alpha, \beta, and the rank 1 \leq k \leq \sampleSize, we seek the smallest sample size \sampleSize such that:

(12)ΒΆ\Prob{x_{\alpha} \leq X_{(\sampleSize-k+1)}} \geq \beta

As equation (12) can be written as:

(13)ΒΆ1-F_{X_{(\sampleSize-k+1)}}(x_{\alpha})\geq \beta

or even as:

F_{\sampleSize, \alpha}(\sampleSize-k)\geq \beta

Note that the problem is defined differently than in equation (2). In order to do so, we solve equation (13) with respect to the sample size \sampleSize. We use an optimization algorithm to determined n_{sol} in the interval \llbracket k, +\infty \llbracket. We can reduce the research interval to the interval \llbracket k, n_2 \rrbracket where n_2 is a size that verifies equation (13). It can be determined using the approximation of the binomial distribution by the normal distribution with the same mean and variance.

Once the smallest size \sampleSize has been estimated, a sample of size \sampleSize can be generated from X and an upper bound of x_{\alpha} is estimated using x_{(n-k+1)} i.e. the k-th observation in the decreasing ordered sample (x_{(\sampleSize)}, \dots, x_{(1)}).

Minimum sample size for a lower bound of the quantileΒΆ

Given the same data as previoulsy, we seek the smallest sample size \sampleSize such that equation (5) is satisfied.

Here is a recap of the existence of solutions for this case:

n_{sol}

\beta=0

0 < \beta < 1

\beta=1

\alpha=0

k

\emptyset

\emptyset

0 < \alpha < 1

\argmin \{\sampleSize \geq k | F_{\sampleSize,\alpha}(k-1) \leq 1-\beta \}

\emptyset

\alpha=1

k

k

k

Minimum sample size for bilateral bounds of the quantileΒΆ

Given two order statistics (k_1, k_2) with 1 \leq k_1 < k_2 \leq \sampleSize, we seek the smallest sample size \sampleSize such that:

(14)ΒΆ\Prob{X_{(k_1)} \leq x_{\alpha} \leq X_{(n-k_2+1)}} \geq \beta

As equation (14) can be written as:

(15)ΒΆF_{X_{(\sampleSize-k_2+1)}}(x_{\alpha}) - F_{X_{(k_1)}}(x_{\alpha}) \geq \beta

or even as:

F_{\sampleSize, \alpha}(\sampleSize-k_2) - F_{\sampleSize, \alpha}(k_1-1)\geq \beta

Note that the problem is defined differently than in equation (9). In order to do so, we solve equation (15) with respect to the sample size \sampleSize. We use an optimization algorithm to determined n_{sol} in the interval \llbracket k, +\infty \llbracket. We can reduce the research interval to the interval \llbracket k, n_2 \rrbracket where n_2 is a size that verifies equation (13). It can be determined using the approximation of the binomial distribution by the normal distribution with the same mean and variance.

Once the smallest size \sampleSize has been estimated, a sample of size \sampleSize can be generated from X and an lower and upper bound of x_{\alpha} is estimated using x_{(k_1)} and x_{(n-k_2+1)} i.e. the k_1-th observation in the ordered sample (x_{(1)}, \dots, x_{(\sampleSize)}) and the \sampleSize-k_2-th observation in the decreasing ordered sample (x_{(\sampleSize)}, \dots, x_{(1)}).

In the particular case where (k_1, k_2) = (1,1), we seek the smallest sample size \sampleSize such that:

\Prob{ \min (X_1, \dots, X_\sampleSize) \leq x_{\alpha} \leq  \max (X_1, \dots, X_\sampleSize)} \geq \beta.

Then, equantion (15) can be written as:

1-\alpha^\sampleSize - (1-\alpha)^\sampleSize \geq \beta.

The optimal \sampleSize is determined using an optimization algorithm which research is reduced to the interval:

\left \lfloor \dfrac{\log (1-\beta)}{\log \gamma} \right \rfloor \leq n \leq \left \lfloor \dfrac{\log \left(\dfrac{1-\beta}{2}\right)}{\log \gamma} \right \rfloor

where \gamma = \max(\alpha, 1-\alpha).