Estimation of a quantile upper bound by Wilks’ method

We consider a random variable X of dimension 1 and the unknown x_{\alpha} level quantile of its distribution (\alpha \in [0, 1]). We seek to evaluate an upper bound of x_{\alpha} with a confidence greater or equal to \beta, using a given order statistics.

Let (X_1, \dots, X_\sampleSize) be some independent copies of X. Let X_{(k)} be the k -th order statistics of (X_1, \dots, X_\sampleSize) which means that X_{(k)} is the k -th maximum of (X_1, \dots, X_\sampleSize) for 1 \leq k \leq \sampleSize. For example, X_{(1)} = \min (X_1, \dots, X_\sampleSize) is the minimum and X_{(\sampleSize)} = \max (X_1, \dots, X_\sampleSize) is the maximum. We have:

X_{(1)} \leq X_{(2)} \leq \dots \leq X_{(\sampleSize)}

Smallest rank for an upper bound to the quantile

Let (x_1, \dots, x_\sampleSize) be an i.i.d. sample of size \sampleSize of the random variable X. Given a quantile level \alpha \in [0,1], a confidence level \beta \in [0,1], and a sample size \sampleSize, we seek the smallest rank k \in \llbracket 1, \sampleSize \rrbracket such that:

(1)\Prob{x_{\alpha} \leq X_{(k)}} \geq \beta

The probability density and cumulative distribution functions of the order statistics X_{(k)} are:

(2)F_{X_{(k)}}(x) & = \sum_{i=k}^{\sampleSize} \binom{\sampleSize}{i}\left(F(x)
\right)^i \left(1-F(x)
\right)^{\sampleSize-i} \\
p_{X_{(k)}}(x) & = (\sampleSize-k+1)\binom{\sampleSize}{k-1}\left(F(x)\right)^{k-1}
\left(1-F(x)
\right)^{\sampleSize-k} p(x)

We notice that F_{X_{(k)}}(x) = \overline{F}_{(\sampleSize,F(x))}(k-1) where F_{(\sampleSize,F(x))} is the cumulated distribution function of the Binomial distribution \cB(\sampleSize,F(x)) and \overline{F}_{(\sampleSize,F(x))}(k) = 1 - F_{(\sampleSize,F(x))}(k) is the complementary cumulated distribution fonction (also named survival function in dimension 1). Therefore:

F_{X_{(k)}}(x_{\alpha}) = \sum_{i=k}^{\sampleSize} \binom{\sampleSize}{i} \alpha^i (1-\alpha)^{\sampleSize-i}
= \overline{F}_{(\sampleSize,\alpha)}(k-1)

and equation (1) implies:

(3)1-F_{X_{(k)}}(x_{\alpha})\geq \beta

This implies:

F_{\sampleSize, \alpha}(k-1)\geq \beta

The smallest rank k_{sol} such that the previous equation is satisfied is:

k_{sol} & = \min \{ k \in \llbracket 1, n \rrbracket \, | \, F_{\sampleSize, \alpha}(k-1)\geq \beta \}\\
        & = 1 +  \min \{ k \in \llbracket 1, n\rrbracket \, | \, F_{\sampleSize, \alpha}(k)\geq \beta \}

An upper bound of x_{\alpha} is estimated by the value of X_{(k_{sol})} on the sample (x_1, \dots, x_\sampleSize).

Minimum sample size for an upper bound to the quantile

Given \alpha, \beta, and k, we seek for the smallest sample size \sampleSize such that the equation (1) is satisfied. In order to do so, we solve the equation (3) with respect to the sample size \sampleSize.

Once the smallest size \sampleSize has been estimated, a sample of size \sampleSize can be generated from X and an upper bound of x_{\alpha} is estimated using x_{(\sampleSize-i)} i.e. the \sampleSize - i-th observation in the ordered sample (x_{(1)}, \dots, x_{(\sampleSize)}).