Estimating a quantile by Wilks’ method

Let us denote \underline{Y} = h\left( \vect{X},\vect{d} \right) = \left( Y^1,\ldots,Y^{n_Y} \right), where \vect{X}= \left( X^1,\ldots,X^{n_X} \right) is a random vector, and \vect{d} a deterministic vector. We seek here to evaluate, using the probability distribution of the random vector \vect{X}, the \alpha-quantile q_{Y^i}(\alpha) of Y^i, where \alpha \in (0, 1):

\begin{aligned}
    \Prob{ Y^i \leq q_{Y^i}(\alpha)} = \alpha
  \end{aligned}

If we have a sample \left\{ \vect{x}_1,\ldots,\vect{x}_N \right\} of N independent samples of the random vector \vect{X}, q_{Y^i}(\alpha) can be estimated as follows:

  • the sample \left\{ \vect{x}_1,\ldots,\vect{x}_N \right\} of vector \vect{X} is first transformed to a sample \left\{ y^i_1,\ldots,y^i_N \right\} of the variable Y^i, using \underline{y} = h(\vect{x}_i,\vect{d}),

  • the sample \left\{ y^i_1,\ldots,y^i_N \right\} is then placed in ascending order, which gives the sample \left\{ y^{(1)},\ldots,y^{(N)} \right\},

  • this empirical estimation of the quantile is then calculated by the formula:

    \begin{aligned}
      \widehat{q}_{y^i}(\alpha) = y^{([N\alpha]+1)}
    \end{aligned}

where [N\alpha] denotes the integral part of N\alpha.

For example, if N=100 and \alpha = 0.95, \widehat{q}_Z(0.95) is equal to y^{(96)}, which is the 5^\textrm{th} largest value of the sample \left\{ y^i_1,\ldots,y^i_N \right\}. We note that this estimation has no meaning unless 1/N \leq \alpha \leq 1-1/N. For example, if N=100, one can only consider values of a to be between 1% and 99%.

It is also possible to calculate an upper limit for the quantile with a confidence level \beta chosen by the user; one can then be sure with a \beta level of confidence that the real value of q_{Y^i}(\alpha)) is less than or equal to \widehat{q}_{Y^i}(\alpha)_{\sup}:

\begin{aligned}
    \Prob{q_{Y^i}(\alpha) \leq \widehat{q}_{Y^i}(\alpha)_{\sup}} = \beta
  \end{aligned}

The most robust method for calculating this upper limit consists of taking \widehat{q}_{Y^i}(\alpha)_{\sup} = y^{(j(\alpha,\beta,N))} where j(\alpha,\beta,N) is an integer between 2 and N found by solving the equation:

\begin{aligned}
    \sum_{k=1}^{j(\alpha,\beta,N) - 1} C^k_N \alpha^k \left( 1-\alpha \right)^{N-k} = \beta
  \end{aligned}

A solution to this does not necessarily exist, i.e. there may be no integer value for j(\alpha,\beta,N) satisfying this equality; one can in this case choose the smallest integer j such that:

\begin{aligned}
    \sum_{k=1}^{j(\alpha,\beta,N) - 1} C^k_N \alpha^k \left( 1-\alpha \right)^{N-k} > \beta
  \end{aligned}

which ensures that \Prob{q_{Y^i}(\alpha) \leq \widehat{q}_{Y^i}(\alpha)_{\sup}} > \beta; in other words, the level of confidence of the quantile estimation is greater than that initially required.

This formula of the confidence interval can be used in two ways:

  • either directly to determine j(\alpha,\beta,N) for the values \alpha,\beta,N chosen by the user,

  • or in reverse to determine the number N of simulations to be carried out for the values \alpha,\beta and j(\alpha,\beta,N) chosen by the user; this is known as Wilks’ formula.

For example for \alpha = \beta = 95\%, we take j=59 with N = 59 simulations (that is the maximum value out of 59 samples) or else j = 92 with N = 93 simulations (that is the second largest result out of the 93 selections). For values of N between 59 and 92, the upper limit is the maximum value of the sample. The following tabular presents the whole results for N \leq 1000, still for \alpha = \beta = 95\%.

N

Rank of the upper bound of the quantile

Rank of the empirical quantile

59

59

57

93

92

89

124

122

118

153

150

146

181

177

172

208

203

198

234

228

223

260

253

248

286

278

272

311

302

296

336

326

320

361

350

343

386

374

367

410

397

390

434

420

413

458

443

436

482

466

458

506

489

481

530

512

504

554

535

527

577

557

549

601

580

571

624

602

593

647

624

615

671

647

638

694

669

660

717

691

682

740

713

704

763

735

725

786

757

747

809

779

769

832

801

791

855

823

813

877

844

834

900

866

856

923

888

877

945

909

898

968

931

920

991

953

942

\widehat{q}_{Y^i}(\alpha) is often called the “empirical \alpha-quantile” for the variable {Y^i}.

API:

References:

  • Wilks, S.S. (1962). “Mathematical Statistics”, New York-London

  • Robert C.P., Casella G. (2004). Monte-Carlo Statistical Methods, Springer, ISBN 0-387-21239-6, 2nd ed.

  • Rubinstein R.Y. (1981). Simulation and The Monte-Carlo methods, John Wiley & Sons