Compute reference probabilities with Monte-Carlo

In this example, we perform a reliability benchmark based on a large Monte-Carlo sample. In order to limit the elapsed time, we consider a limited elapsed time for each problem. In order to get the best possible accuracy within this time limit, we set the coefficient of variation to zero.

import otbenchmark as otb
import pandas as pd
import numpy as np
from tqdm import tqdm
import time
problemslist = otb.ReliabilityBenchmarkProblemList()
numberOfProblems = len(problemslist)
numberOfProblems
26
coefficientOfVariation = 0.0
maximumOuterSampling = 10 ** 4  # 10 ** 6 for real
blockSize = 10 ** 0  # 10 ** 4 for real simulations
blockSize
1
confidenceLevel = 0.95
maximumDurationSeconds = 5 * 60.0
totalDurationMinutes = numberOfProblems * maximumDurationSeconds / 60.0
totalDurationMinutes
130.0
model_names = [problemslist[i].getName() for i in range(numberOfProblems)]
metrics = ["PF", "N. function calls", "PMin", "PMax", "C.O.V.", "Digits", "Time (s)"]
resultArray = np.zeros((numberOfProblems, len(metrics)))
for i in tqdm(range(numberOfProblems)):
    startTime = time.time()
    problem = problemslist[i]
    name = problem.getName()
    event = problem.getEvent()
    g = event.getFunction()
    factory = otb.ProbabilitySimulationAlgorithmFactory()
    algo = factory.buildMonteCarlo(problem)
    algo.setMaximumOuterSampling(maximumOuterSampling)
    algo.setBlockSize(blockSize)
    algo.setMaximumCoefficientOfVariation(coefficientOfVariation)
    algo.setMaximumTimeDuration(maximumDurationSeconds)
    initialNumberOfCall = g.getEvaluationCallsNumber()
    algo.run()
    result = algo.getResult()
    numberOfFunctionEvaluations = g.getEvaluationCallsNumber() - initialNumberOfCall
    computedProbability = result.getProbabilityEstimate()
    confidenceLength = result.getConfidenceLength(confidenceLevel)
    pmin = computedProbability - 0.5 * confidenceLength
    pmax = computedProbability + 0.5 * confidenceLength
    cov = result.getCoefficientOfVariation()
    if cov > 0.0:
        expectedDigits = -np.log10(cov) - 1.0
    else:
        expectedDigits = 0.0
    stopTime = time.time()
    elapsedTime = stopTime - startTime
    resultArray[i][0] = computedProbability
    resultArray[i][1] = numberOfFunctionEvaluations
    resultArray[i][2] = pmin
    resultArray[i][3] = pmax
    resultArray[i][4] = cov
    resultArray[i][5] = expectedDigits
    resultArray[i][6] = elapsedTime
  0%|          | 0/26 [00:00<?, ?it/s]
  4%|▍         | 1/26 [00:00<00:13,  1.87it/s]
  8%|▊         | 2/26 [00:00<00:07,  3.28it/s]
 19%|█▉        | 5/26 [00:00<00:02,  8.67it/s]
 31%|███       | 8/26 [00:00<00:01, 12.84it/s]
 38%|███▊      | 10/26 [00:01<00:01, 13.65it/s]
 50%|█████     | 13/26 [00:01<00:01, 12.83it/s]
 58%|█████▊    | 15/26 [00:01<00:00, 14.10it/s]
 65%|██████▌   | 17/26 [00:01<00:00, 13.00it/s]
 73%|███████▎  | 19/26 [00:01<00:00, 14.02it/s]
 81%|████████  | 21/26 [00:02<00:01,  4.66it/s]
 88%|████████▊ | 23/26 [00:03<00:00,  4.16it/s]
100%|██████████| 26/26 [00:03<00:00,  6.05it/s]
100%|██████████| 26/26 [00:03<00:00,  7.28it/s]
df = pd.DataFrame(resultArray, index=model_names, columns=metrics)
# df.to_csv("reliability_compute_reference_proba.csv")
df
PF N. function calls PMin PMax C.O.V. Digits Time (s)
RP8 0.0004 10000.0 0.000008 0.000792 0.499900 -0.698883 0.535655
RP14 0.0006 10000.0 0.000120 0.001080 0.408126 -0.610794 0.142935
RP22 0.0043 10000.0 0.003018 0.005582 0.152170 -0.182330 0.039490
RP24 0.0020 10000.0 0.001124 0.002876 0.223383 -0.349050 0.038912
RP25 0.0000 10000.0 0.000000 0.000000 -1.000000 0.000000 0.041950
RP28 0.0000 10000.0 0.000000 0.000000 -1.000000 0.000000 0.041377
RP31 0.0028 10000.0 0.001764 0.003836 0.188717 -0.275812 0.037411
RP33 0.0030 10000.0 0.001928 0.004072 0.182300 -0.260787 0.044791
RP35 0.0031 10000.0 0.002010 0.004190 0.179327 -0.253645 0.042297
RP38 0.0082 10000.0 0.006432 0.009968 0.109978 -0.041305 0.084869
RP53 0.0280 10000.0 0.024767 0.031233 0.058919 0.229746 0.044617
RP55 0.5666 10000.0 0.556888 0.576312 0.008746 1.058194 0.044120
RP54 0.0008 10000.0 0.000246 0.001354 0.353412 -0.548281 0.165035
RP57 0.0308 10000.0 0.027414 0.034186 0.056096 0.251069 0.052746
RP75 0.0101 10000.0 0.008140 0.012060 0.099000 0.004365 0.052816
RP89 0.0062 10000.0 0.004662 0.007738 0.126606 -0.102454 0.053184
RP107 0.0000 10000.0 0.000000 0.000000 -1.000000 0.000000 0.130311
RP110 0.0000 10000.0 0.000000 0.000000 -1.000000 0.000000 0.058108
RP111 0.0000 10000.0 0.000000 0.000000 -1.000000 0.000000 0.057080
RP63 0.0005 10000.0 0.000062 0.000938 0.447102 -0.650406 1.039642
RP91 0.0014 10000.0 0.000667 0.002133 0.267074 -0.426632 0.078949
RP60 0.0425 10000.0 0.038546 0.046454 0.047465 0.323625 0.547330
RP77 0.0000 10000.0 0.000000 0.000000 -1.000000 0.000000 0.058332
Four-branch serial system 0.0024 10000.0 0.001441 0.003359 0.203879 -0.309373 0.046992
R-S 0.0774 10000.0 0.072162 0.082638 0.034525 0.461864 0.045479
Axial stressed beam 0.0297 10000.0 0.026373 0.033027 0.057158 0.242925 0.045458


The problems with higher failture probabilities are obviously solved with more accuracy with the Monte-Carlo method. For example, the RP55 problem which has the highest probability equal to 0.560 has more than 3 significant digits. On the opposite side, the problems with probabilities close to zero are much more difficult to solve. The RP28 with a probability close to 10^{-7} has no significant digit.

These previous results are consistent with the distribution of the Monte-Carlo estimator. The properties of the binomial distribution imply that its variance is:

\sigma_{p_f}^2 = \frac{p_f (1-p_f)}{n}

where n is the sample size and p_f is the failure probability. The coefficient of variation is:

CV = \frac{\sigma_{p_f}}{p_f}.

Since we do not know the exact value of p_f, we use is approximation \tilde{p_f} instead (this turns rigorous equations into approximate ones, but does not change the outcome). This implies:

CV = \sqrt{\frac{1 - p_f}{p_f n}}.

When p_f\rightarrow 0, we have p_f \rightarrow 0 which implies:

CV \rightarrow \sqrt{\frac{1}{p_f n}}.

Inverting the previous equation, we get the sample size given the coefficient of variation:

n \approx \frac{1}{p_f CV^2}.

This leads to the rule of thumb that, in order to estimate the probability p_f = 10^{-m}, where m is an integer, we need a sample size equal to:

n \approx \frac{1}{10^{-m} 10^{-2}} = 10^{m+2}.

For example, estimating the probability of the RP28 problem with just one single digit leads to a sample size equal to n=10^9, since the exact p_f \approx 10^{-7}.

Total running time of the script: (0 minutes 3.779 seconds)