Estimation of a quantile boundΒΆ

We consider a random variable X of dimension 1 and the unknown quantile x_{\alpha} of level \alpha of its distribution (\alpha \in [0, 1]). We seek to evaluate an upper bound of x_{\alpha} with a confidence greater or equal to \beta, using order statistics.

Let (X_1, \dots, X_\sampleSize) be some independent copies of X. Let X_{(k)} be the k -th order statistics of (X_1, \dots, X_\sampleSize) which means that X_{(k)} is the k -th minimum of (X_1, \dots, X_\sampleSize) for 1 \leq k \leq \sampleSize. For example, X_{(1)} = \min (X_1, \dots, X_\sampleSize) is the minimum and X_{(\sampleSize)} = \max (X_1, \dots, X_\sampleSize) is the maximum. We have:

X_{(1)} \leq X_{(2)} \leq \dots \leq X_{(\sampleSize)}

The probability density and cumulative distribution functions of the order statistics X_{(k)} are:

(1)ΒΆF_{X_{(k)}}(x) & = \sum_{i=k}^{\sampleSize} \binom{\sampleSize}{i}\left(F(x)
\right)^i \left(1-F(x)
\right)^{\sampleSize-i} \\
p_{X_{(k)}}(x) & = (\sampleSize-k+1)\binom{\sampleSize}{k-1}\left(F(x)\right)^{k-1}
\left(1-F(x)
\right)^{\sampleSize-k} p(x)

We notice that F_{X_{(k)}}(x) = \overline{F}_{(\sampleSize,F(x))}(k-1) where F_{(\sampleSize,F(x))} is the cumulated distribution function of the Binomial distribution \cB(\sampleSize,F(x)) and \overline{F}_{(\sampleSize,F(x))}(k) = 1 - F_{(\sampleSize,F(x))}(k) is the complementary cumulated distribution fonction (also named survival function in dimension 1). Therefore:

F_{X_{(k)}}(x_{\alpha}) = \sum_{i=k}^{\sampleSize} \binom{\sampleSize}{i} \alpha^i (1-\alpha)^{\sampleSize-i}
= \overline{F}_{(\sampleSize,\alpha)}(k-1)

Rank for an upper bound of the quantileΒΆ

Let (x_1, \dots, x_\sampleSize) be an i.i.d. sample of size \sampleSize of the random variable X. Given a quantile level \alpha \in [0,1], a confidence level \beta \in [0,1], and a sample size \sampleSize, we seek the smallest rank k \in \llbracket 1, \sampleSize \rrbracket such that:

(2)ΒΆ\Prob{x_{\alpha} \leq X_{(k)}} \geq \beta \qquad

As equation (2) implies:

(3)ΒΆ1-F_{X_{(k)}}(x_{\alpha})\geq \beta

This implies:

F_{\sampleSize, \alpha}(k-1)\geq \beta

The smallest rank k_{sol} such that the previous equation is satisfied is:

k_{sol} & = \min \{ k \in \llbracket 1, n \rrbracket \, | \, F_{\sampleSize, \alpha}(k-1)\geq \beta \}\\
        & = 1 +  \min \{ k \in \llbracket 1, n\rrbracket \, | \, F_{\sampleSize, \alpha}(k)\geq \beta \}

An upper bound of x_{\alpha} is estimated by the value of X_{(k_{sol})} on the sample (x_1, \dots, x_\sampleSize).

Here is a recap of the existence of solutions for this case:

K_{sol}

\beta=0

0 < \beta < 1

\beta=1

\alpha=0

1

1

1

0 < \alpha < 1

1

see (4)

\emptyset

\alpha=1

1

\emptyset

\emptyset

With:

(4)ΒΆ1+F_{n,\alpha}^{-1}(\beta) \text{if} 1-\alpha^n \geq \beta \text{else} \emptyset

Rank for a lower bound of the quantileΒΆ

Similarly for the lower bound we seek the greatest rank k \in \llbracket 1, \sampleSize \rrbracket such that:

(5)ΒΆ\Prob{X_{(k)} \leq x_{\alpha}} \geq \beta \qquad

Here is a recap of the existence of solutions for this case:

K_{sol}

\beta=0

0 < \beta < 1

\beta=1

\alpha=0

n

\emptyset

\emptyset

0 < \alpha < 1

n

see (6)

\emptyset

\alpha=1

n

n

n

With

(6)ΒΆ\emptyset \text{if} (1-\alpha)^n > 1 - \beta \\
\text{otherwise if there exists} k_0 | 1-\beta = F_{(\sampleSize,\alpha}(k_0 - 1) \text{then} k_{sol} = 1+F_{n,\alpha}^{-1}(1-\beta)
\text{and if not} k_{sol} = F_{n,\alpha}^{-1}(1-\beta)

Ranks for bilateral bounds of the quantileΒΆ

Similarly for the lower bound we seek the ranks k_1, k_2 \in \llbracket 1, \sampleSize \rrbracket^2 such that:

(7)ΒΆ\Prob{X_{(k_1)} \leq x_{\alpha} \leq X_{(k_2)}} \geq \beta \qquad

with k_2 - k_1 the smallest.

Here is a recap of the existence of solutions for this case:

K_{sol}

\beta=0

0 < \beta < 1

\beta=1

\alpha=0

\Bigl\lfloor \frac{n}{2} \Bigr\rfloor

\emptyset

\emptyset

0 < \alpha < 1

1

\emptyset or 1

\emptyset

\alpha=1

\Bigl\lfloor \frac{n}{2} \Bigr\rfloor

\emptyset

\emptyset

Minimum sample size for an upper bound of the quantileΒΆ

Given \alpha, \beta, and order k, we seek for the smallest sample size \sampleSize such that the equation (2) is satisfied. In order to do so, we solve the equation (3) with respect to the sample size \sampleSize.

Once the smallest size \sampleSize has been estimated, a sample of size \sampleSize can be generated from X and an upper bound of x_{\alpha} is estimated using x_{(k)} i.e. the k-th observation in the ordered sample (x_{(1)}, \dots, x_{(\sampleSize)}).

Here is a recap of the existence of solutions for this case:

\beta=0

0 < \beta < 1

\beta=1

0 \leq \alpha \leq 1

k \text{if} 1-\alpha^k \geq \beta \text{else} \emptyset

Minimum sample size for a lower bound of the quantileΒΆ

Similarly for the lower bound, we seek for the smallest sample size \sampleSize such that the equation (5) is satisfied.

Here is a recap of the existence of solutions for this case:

\beta=0

0 < \beta < 1

\beta=1

\alpha=0

k

\emptyset

\emptyset

0 < \alpha < 1

\argmin \{n \geq k | f_{k,\alpha} \leq 1-\beta \}

\emptyset

\alpha=1

k

k

k

Minimum sample size for bilateral bounds of the quantileΒΆ

Similarly for the bilateral bounds, we seek for the smallest sample size \sampleSize such that the equation (7) is satisfied.

Here is a recap of the existence of solutions for this case:

\beta=0

0 < \beta < 1

\beta=1

\alpha=1

k_2

\emptyset

\emptyset

0 < \alpha < 1

k_2 if 1-\alpha^{k_2} - F_{k_2,\alpha}(k_1-1) \geq \beta else \emptyset

\alpha=0

\emptyset if k_1 \neq 0 and \beta > 0 else k_2