Compute reference probabilities with Monte-Carlo¶

In this example, we perform a reliability benchmark based on a large Monte-Carlo sample. In order to limit the elapsed time, we consider a limited elapsed time for each problem. In order to get the best possible accuracy within this time limit, we set the coefficient of variation to zero.

import otbenchmark as otb
import pandas as pd
import numpy as np
from tqdm import tqdm
import time

problemslist = otb.ReliabilityBenchmarkProblemList()
numberOfProblems = len(problemslist)
numberOfProblems

coefficientOfVariation = 0.0
maximumOuterSampling = 10 ** 4  # 10 ** 6 for real
blockSize = 10 ** 0  # 10 ** 4 for real simulations
blockSize

confidenceLevel = 0.95
maximumDurationSeconds = 5 * 60.0

totalDurationMinutes = numberOfProblems * maximumDurationSeconds / 60.0
totalDurationMinutes

130.0

model_names = [problemslist[i].getName() for i in range(numberOfProblems)]
metrics = ["PF", "N. function calls", "PMin", "PMax", "C.O.V.", "Digits", "Time (s)"]
resultArray = np.zeros((numberOfProblems, len(metrics)))
for i in tqdm(range(numberOfProblems)):
    startTime = time.time()
    problem = problemslist[i]
    name = problem.getName()
    event = problem.getEvent()
    g = event.getFunction()
    factory = otb.ProbabilitySimulationAlgorithmFactory()
    algo = factory.buildMonteCarlo(problem)
    algo.setMaximumOuterSampling(maximumOuterSampling)
    algo.setBlockSize(blockSize)
    algo.setMaximumCoefficientOfVariation(coefficientOfVariation)
    algo.setMaximumTimeDuration(maximumDurationSeconds)
    initialNumberOfCall = g.getEvaluationCallsNumber()
    algo.run()
    result = algo.getResult()
    numberOfFunctionEvaluations = g.getEvaluationCallsNumber() - initialNumberOfCall
    computedProbability = result.getProbabilityEstimate()
    confidenceLength = result.getConfidenceLength(confidenceLevel)
    pmin = computedProbability - 0.5 * confidenceLength
    pmax = computedProbability + 0.5 * confidenceLength
    cov = result.getCoefficientOfVariation()
    if cov > 0.0:
        expectedDigits = -np.log10(cov) - 1.0
    else:
        expectedDigits = 0.0
    stopTime = time.time()
    elapsedTime = stopTime - startTime
    resultArray[i][0] = computedProbability
    resultArray[i][1] = numberOfFunctionEvaluations
    resultArray[i][2] = pmin
    resultArray[i][3] = pmax
    resultArray[i][4] = cov
    resultArray[i][5] = expectedDigits
    resultArray[i][6] = elapsedTime

  0%|          | 0/26 [00:00<?, ?it/s]
  8%|▊         | 2/26 [00:00<00:01, 13.86it/s]
 19%|█▉        | 5/26 [00:00<00:01, 19.61it/s]
 31%|███       | 8/26 [00:00<00:00, 20.77it/s]
 42%|████▏     | 11/26 [00:00<00:00, 20.16it/s]
 54%|█████▍    | 14/26 [00:00<00:00, 17.11it/s]
 65%|██████▌   | 17/26 [00:00<00:00, 16.19it/s]
 73%|███████▎  | 19/26 [00:01<00:00, 16.70it/s]
 81%|████████  | 21/26 [00:02<00:00,  5.59it/s]
 88%|████████▊ | 23/26 [00:02<00:00,  6.72it/s]
 96%|█████████▌| 25/26 [00:02<00:00,  8.23it/s]
100%|██████████| 26/26 [00:02<00:00, 10.79it/s]

df = pd.DataFrame(resultArray, index=model_names, columns=metrics)
# df.to_csv("reliability_compute_reference_proba.csv")
df

	PF	N. function calls	PMin	PMax	C.O.V.	Digits	Time (s)
RP8	0.0011	10000.0	0.000450	0.001750	0.301345	-0.479065	0.066655
RP14	0.0010	10000.0	0.000381	0.001619	0.316070	-0.499783	0.077592
RP22	0.0041	10000.0	0.002848	0.005352	0.155853	-0.192716	0.039505
RP24	0.0021	10000.0	0.001203	0.002997	0.217989	-0.338434	0.039663
RP25	0.0000	10000.0	0.000000	0.000000	-1.000000	0.000000	0.043977
RP28	0.0000	10000.0	0.000000	0.000000	-1.000000	0.000000	0.045419
RP31	0.0033	10000.0	0.002176	0.004424	0.173790	-0.240025	0.040889
RP33	0.0024	10000.0	0.001441	0.003359	0.203879	-0.309373	0.049150
RP35	0.0027	10000.0	0.001683	0.003717	0.192190	-0.283731	0.040191
RP38	0.0086	10000.0	0.006790	0.010410	0.107368	-0.030875	0.075518
RP53	0.0331	10000.0	0.029594	0.036606	0.054048	0.267223	0.039082
RP55	0.5557	10000.0	0.545961	0.565439	0.008942	1.048582	0.038395
RP54	0.0006	10000.0	0.000120	0.001080	0.408126	-0.610794	0.137346
RP57	0.0288	10000.0	0.025522	0.032078	0.058071	0.236042	0.044433
RP75	0.0085	10000.0	0.006701	0.010299	0.108003	-0.033437	0.043540
RP89	0.0063	10000.0	0.004749	0.007851	0.125591	-0.098957	0.043973
RP107	0.0000	10000.0	0.000000	0.000000	-1.000000	0.000000	0.116150
RP110	0.0000	10000.0	0.000000	0.000000	-1.000000	0.000000	0.055077
RP111	0.0000	10000.0	0.000000	0.000000	-1.000000	0.000000	0.053075
RP63	0.0006	10000.0	0.000120	0.001080	0.408126	-0.610794	0.947401
RP91	0.0007	10000.0	0.000182	0.001218	0.377832	-0.577299	0.081028
RP60	0.0494	10000.0	0.045153	0.053647	0.043867	0.357865	0.072951
RP77	0.0000	10000.0	0.000000	0.000000	-1.000000	0.000000	0.063930
Four-branch serial system	0.0024	10000.0	0.001441	0.003359	0.203879	-0.309373	0.051752
R-S	0.0773	10000.0	0.072066	0.082534	0.034549	0.461559	0.050134
Axial stressed beam	0.0312	10000.0	0.027792	0.034608	0.055724	0.253960	0.049662

The problems with higher failture probabilities are obviously solved with more accuracy with the Monte-Carlo method. For example, the RP55 problem which has the highest probability equal to 0.560 has more than 3 significant digits. On the opposite side, the problems with probabilities close to zero are much more difficult to solve. The RP28 with a probability close to $10^{-7}$ has no significant digit.

These previous results are consistent with the distribution of the Monte-Carlo estimator. The properties of the binomial distribution imply that its variance is:

$\sigma_{p_f}^2 = \frac{p_f (1-p_f)}{n}$

where $n$ is the sample size and $p_f$ is the failure probability. The coefficient of variation is:

$CV = \frac{\sigma_{p_f}}{p_f}.$

Since we do not know the exact value of $p_f$ , we use is approximation $\tilde{p_f}$ instead (this turns rigorous equations into approximate ones, but does not change the outcome). This implies:

$CV = \sqrt{\frac{1 - p_f}{p_f n}}.$

When $p_f\rightarrow 0$ , we have $p_f \rightarrow 0$ which implies:

$CV \rightarrow \sqrt{\frac{1}{p_f n}}.$

Inverting the previous equation, we get the sample size given the coefficient of variation:

$n \approx \frac{1}{p_f CV^2}.$

This leads to the rule of thumb that, in order to estimate the probability $p_f = 10^{-m}$ , where $m$ is an integer, we need a sample size equal to:

$n \approx \frac{1}{10^{-m} 10^{-2}} = 10^{m+2}.$

For example, estimating the probability of the RP28 problem with just one single digit leads to a sample size equal to $n=10^9$ , since the exact $p_f \approx 10^{-7}$ .

Total running time of the script: (0 minutes 2.725 seconds)

otbenchmark

Module otbenchmark

Table of Contents

Previous topic

Next topic

This Page

Compute reference probabilities with Monte-Carlo¶