Estimating a quantile by Wilks’ method¶

Let us denote $\underline{Y} = h\left( \vect{X},\vect{d} \right) = \left( Y^1,\ldots,Y^{n_Y} \right)$ , where $\vect{X}= \left( X^1,\ldots,X^{n_X} \right)$ is a random vector, and $\vect{d}$ a deterministic vector. We seek here to evaluate, using the probability distribution of the random vector $\vect{X}$ , the $\alpha$ -quantile $q_{Y^i}(\alpha)$ of $Y^i$ , where $\alpha \in (0, 1)$ :

$\begin{aligned} \Prob{ Y^i \leq q_{Y^i}(\alpha)} = \alpha \end{aligned}$

If we have a sample $\left\{ \vect{x}_1,\ldots,\vect{x}_N \right\}$ of $N$ independent samples of the random vector $\vect{X}$ , $q_{Y^i}(\alpha)$ can be estimated as follows:

the sample $\left\{ \vect{x}_1,\ldots,\vect{x}_N \right\}$ of vector $\vect{X}$ is first transformed to a sample $\left\{ y^i_1,\ldots,y^i_N \right\}$ of the variable $Y^i$ , using $\underline{y} = h(\vect{x}_i,\vect{d})$ ,
the sample $\left\{ y^i_1,\ldots,y^i_N \right\}$ is then placed in ascending order, which gives the sample $\left\{ y^{(1)},\ldots,y^{(N)} \right\}$ ,
this empirical estimation of the quantile is then calculated by the formula:

$\begin{aligned} \widehat{q}_{y^i}(\alpha) = y^{([N\alpha]+1)} \end{aligned}$

where $[N\alpha]$ denotes the integral part of $N\alpha$ .

For example, if $N=100$ and $\alpha = 0.95$ , $\widehat{q}_Z(0.95)$ is equal to $y^{(96)}$ , which is the $5^\textrm{th}$ largest value of the sample $\left\{ y^i_1,\ldots,y^i_N \right\}$ . We note that this estimation has no meaning unless $1/N \leq \alpha \leq 1-1/N$ . For example, if $N=100$ , one can only consider values of a to be between 1% and 99%.

It is also possible to calculate an upper limit for the quantile with a confidence level $\beta$ chosen by the user; one can then be sure with a $\beta$ level of confidence that the real value of $q_{Y^i}(\alpha))$ is less than or equal to $\widehat{q}_{Y^i}(\alpha)_{\sup}$ :

$\begin{aligned} \Prob{q_{Y^i}(\alpha) \leq \widehat{q}_{Y^i}(\alpha)_{\sup}} = \beta \end{aligned}$

The most robust method for calculating this upper limit consists of taking $\widehat{q}_{Y^i}(\alpha)_{\sup} = y^{(j(\alpha,\beta,N))}$ where $j(\alpha,\beta,N)$ is an integer between 2 and $N$ found by solving the equation:

$\begin{aligned} \sum_{k=1}^{j(\alpha,\beta,N) - 1} C^k_N \alpha^k \left( 1-\alpha \right)^{N-k} = \beta \end{aligned}$

A solution to this does not necessarily exist, i.e. there may be no integer value for $j(\alpha,\beta,N)$ satisfying this equality; one can in this case choose the smallest integer $j$ such that:

$\begin{aligned} \sum_{k=1}^{j(\alpha,\beta,N) - 1} C^k_N \alpha^k \left( 1-\alpha \right)^{N-k} > \beta \end{aligned}$

which ensures that $\Prob{q_{Y^i}(\alpha) \leq \widehat{q}_{Y^i}(\alpha)_{\sup}} > \beta$ ; in other words, the level of confidence of the quantile estimation is greater than that initially required.

This formula of the confidence interval can be used in two ways:

either directly to determine $j(\alpha,\beta,N)$ for the values $\alpha,\beta,N$ chosen by the user,
or in reverse to determine the number $N$ of simulations to be carried out for the values $\alpha,\beta$ and $j(\alpha,\beta,N)$ chosen by the user; this is known as Wilks’ formula.

For example for $\alpha = \beta = 95\%$ , we take $j=59$ with $N = 59$ simulations (that is the maximum value out of 59 samples) or else $j = 92$ with $N = 93$ simulations (that is the second largest result out of the 93 selections). For values of $N$ between $59$ and $92$ , the upper limit is the maximum value of the sample. The following tabular presents the whole results for $N \leq 1000$ , still for $\alpha = \beta = 95\%$ .

$N$	Rank of the upper bound of the quantile	Rank of the empirical quantile
59	59	57
93	92	89
124	122	118
153	150	146
181	177	172
208	203	198
234	228	223
260	253	248
286	278	272
311	302	296
336	326	320
361	350	343
386	374	367
410	397	390
434	420	413
458	443	436
482	466	458
506	489	481
530	512	504
554	535	527
577	557	549
601	580	571
624	602	593
647	624	615
671	647	638
694	669	660
717	691	682
740	713	704
763	735	725
786	757	747
809	779	769
832	801	791
855	823	813
877	844	834
900	866	856
923	888	877
945	909	898
968	931	920
991	953	942

$\widehat{q}_{Y^i}(\alpha)$ is often called the “empirical $\alpha$ -quantile” for the variable ${Y^i}$ .

OpenTURNS

An Open source initiative for the Treatment of Uncertainties, Risks'N Statistics

Previous topic

Next topic

This Page

Estimating a quantile by Wilks’ method¶

$N$	Rank of the upper bound of the quantile	Rank of the empirical quantile
59	59	57
93	92	89
124	122	118
153	150	146
181	177	172
208	203	198
234	228	223
260	253	248
286	278	272
311	302	296
336	326	320
361	350	343
386	374	367
410	397	390
434	420	413
458	443	436
482	466	458
506	489	481
530	512	504
554	535	527
577	557	549
601	580	571
624	602	593
647	624	615
671	647	638
694	669	660
717	691	682
740	713	704
763	735	725
786	757	747
809	779	769
832	801	791
855	823	813
877	844	834
900	866	856
923	888	877
945	909	898
968	931	920
991	953	942

$N$	Rank of the upper bound of the quantile	Rank of the empirical quantile
59	59	57
93	92	89
124	122	118
153	150	146
181	177	172
208	203	198
234	228	223
260	253	248
286	278	272
311	302	296
336	326	320
361	350	343
386	374	367
410	397	390
434	420	413
458	443	436
482	466	458
506	489	481
530	512	504
554	535	527
577	557	549
601	580	571
624	602	593
647	624	615
671	647	638
694	669	660
717	691	682
740	713	704
763	735	725
786	757	747
809	779	769
832	801	791
855	823	813
877	844	834
900	866	856
923	888	877
945	909	898
968	931	920
991	953	942

$N$	Rank of the upper bound of the quantile	Rank of the empirical quantile
59	59	57
93	92	89
124	122	118
153	150	146
181	177	172
208	203	198
234	228	223
260	253	248
286	278	272
311	302	296
336	326	320
361	350	343
386	374	367
410	397	390
434	420	413
458	443	436
482	466	458
506	489	481
530	512	504
554	535	527
577	557	549
601	580	571
624	602	593
647	624	615
671	647	638
694	669	660
717	691	682
740	713	704
763	735	725
786	757	747
809	779	769
832	801	791
855	823	813
877	844	834
900	866	856
923	888	877
945	909	898
968	931	920
991	953	942