Exact quantile confidence interval based on order statisticsΒΆ
We consider a random variable of dimension 1 and its quantile
of level
(
).
We seek to evaluate an upper bound of
with a confidence greater or equal to
, using order statistics.
Let be some independent copies of
.
Let
be the
-th order statistics of
which means that
is the
-th minimum of
for
. For
example,
is the minimum
and
is the maximum. We have:
The probability density and cumulative distribution functions of the order
statistics are:
(1)ΒΆ
We notice that where
is the cumulative
distribution function of the Binomial distribution
and
is the
complementary cumulative distribution fonction (also named survival function in dimension
1).
Therefore:
Rank for an upper bound of the quantileΒΆ
Let be an i.i.d. sample of size
of
the random variable
.
Given a quantile level
, a confidence level
, and a sample size
, we seek the smallest
rank
such that:
(2)ΒΆ
As equation (2) can be written as:
(3)ΒΆ
or even as:
Then, the smallest rank such that the previous equation is satisfied is:
An upper bound of is estimated by the value of
on the sample
.
Here is a recap of the existence of solutions for this case:
1 |
1 |
1 |
|
1 |
see (4) |
||
1 |
where:
(4)ΒΆ
Rank for a lower bound of the quantileΒΆ
Given the same data as previoulsy, we seek the greatest rank such that:
(5)ΒΆ
which can be written as:
(6)ΒΆ
and finally as:
Then, the greatest rank such that the previous equation is satisfied is:
Here is a recap of the existence of solutions for this case:
see (7) |
|||
where:
(7)ΒΆ
Ranks for bilateral bounds of the quantileΒΆ
Given the same data as previoulsy, we can seek the ranks
as solution of different problems.
The problem can be:
(8)ΒΆ
or:
(9)ΒΆ
or:
(10)ΒΆ
or with and
the greatest rank such that:
(11)ΒΆ
The solutions of (8) and (9) are determined numerically, using an optimization algorithm.
The solutions of (10) are respectively defined by:
which leads to the respective solutions:
and
Then, the previous tables written for the lower and upper bounds can be used to find and
respectively with
or
.
The solutions of (11) are gathered here:
1 |
|
||||
|
|||||
Minimum sample size for an upper bound of the quantileΒΆ
Given ,
, and the rank
, we seek the smallest sample size
such that:
(12)ΒΆ
As equation (12) can be written as:
(13)ΒΆ
or even as:
Note that the problem is defined differently than in equation (2). In order to do so, we solve
equation (13) with respect to the sample size . We use an optimization algorithm to determined
in the interval
. We can reduce the research interval to the interval
where
is a size
that verifies equation (13). It
can be determined using the approximation of the binomial distribution by the normal distribution with the same mean and variance.
Once the smallest size has been estimated, a sample of size
can be
generated from
and an upper bound of
is estimated using
i.e. the
-th observation
in the decreasing ordered sample
.
Minimum sample size for a lower bound of the quantileΒΆ
Given the same data as previoulsy, we seek the smallest sample size
such that equation (5) is satisfied.
Here is a recap of the existence of solutions for this case:
Minimum sample size for bilateral bounds of the quantileΒΆ
Given two order statistics with
, we seek the smallest sample size
such that:
(14)ΒΆ
As equation (14) can be written as:
(15)ΒΆ
or even as:
Note that the problem is defined differently than in equation (9). In order to do so, we solve
equation (15) with respect to the sample size . We use an optimization algorithm to determined
in the interval
. We can reduce the research interval to the interval
where
is a size
that verifies equation (13). It
can be determined using the approximation of the binomial distribution by the normal distribution with the same mean and variance.
Once the smallest size has been estimated, a sample of size
can be
generated from
and an lower and upper bound of
is estimated using
and
i.e. the
-th observation
in the ordered sample
and the
-th observation
in the decreasing ordered sample
.
In the particular case where , we seek the smallest sample size
such that:
Then, equantion (15) can be written as:
The optimal is determined using an optimization algorithm which research is reduced to the interval:
where .
OpenTURNS