.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_meta_modeling/polynomial_chaos_metamodel/plot_chaos_cleaning_strategy.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_meta_modeling_polynomial_chaos_metamodel_plot_chaos_cleaning_strategy.py: Create a sparse chaos by integration ==================================== .. GENERATED FROM PYTHON SOURCE LINES 7-39 The goal of this example is to show how to create a sparse polynomial chaos expansion (PCE) when we estimate its coefficients by integration. We show how to use the :class:`~openturns.CleaningStrategy` class. Polynomial chaos expansion -------------------------- Let :math:`g : \mathcal{D} \rightarrow \mathbb{R}` be a function where :math:`\mathcal{D} \subseteq \mathbb{R}^p` is the domain of :math:`g`. We consider :math:`\vect{X}` a random vector which probability density function is denoted by :math:`f`. We assume that :math:`g(\vect{X})` has a finite second order moment. Let :math:`T` be an iso-probabilistic transformation such that :math:`\vect{Z} = T(\vect{X})` follows a distribution uniquely defined by all its moments. Let :math:`h` be the function defined by: .. math:: h = g \circ T^{-1}. The polynomial chaos decomposition of :math:`h` with respect to the measure of :math:`\vect{Z}` is (see [blatman2009]_ page 73) : .. math:: h(\vect{z}) = \sum_{\vect{\alpha} \in \mathbb{N}^p} a_{\vect{\alpha}} \psi_{\vect{\alpha}}(\vect{z}) where :math:`\vect{\alpha} = (\alpha_1, ..., \alpha_p) \in \mathbb{N}^p` is a multi-index, :math:`a_{\vect{\alpha}} \in \mathbb{R}` is the coefficient, :math:`\psi_{\vect{\alpha}} : \mathcal{\bar{X}} \rightarrow \mathbb{R}` is a multivariate polynomial. .. GENERATED FROM PYTHON SOURCE LINES 42-84 Truncated expansion ------------------- In practice, we cannot consider an infinite series and must truncate the decomposition at a given order. Only a selection of coefficients must be kept. This leads to a subset of all possible multi-indices. In the remainder of this text, we call this selection the *multi-index set*. Several multi-index sets can be considered. A simple method is to truncate the polynomial up to a given maximum total degree :math:`d \in \mathbb{N}`. Let :math:`\mathcal{A}^{d}` be the multi-index set defined by .. math:: \mathcal{A}^{d} = \left\{ \vect{\alpha} \in \mathbb{N}^p \; | \; \|\vect{\alpha}\|_1 \leq d\right\} where .. math:: \|\vect{\alpha}\|_d = \alpha_1 + ... + \alpha_p is the 1-norm of the multi-index :math:`\vect{\alpha}`. Therefore, the truncated polynomial chaos expansion is: .. math:: \widetilde{h}(\vect{z}) = \sum_{\vect{\alpha} \in \mathcal{A}^{d}} a_{\vect{\alpha}} \psi_{\vect{\alpha}}(\vect{z}). In order to ensure a low error, we may choose a large value of the parameter :math:`P`. This, however, leads to a large number of coefficients :math:`\vect{\alpha} \in \mathcal{A}^{d}` to estimate. More precisely, the number of coefficients to estimate is (see [blatman2009]_ page 73) : .. math:: \textrm{card}\left(\mathcal{A}^{d}\right) = {p + d \choose d} = \frac{(p + d)!}{p! d!} where :math:`p!` is the factorial number of :math:`p`. .. GENERATED FROM PYTHON SOURCE LINES 87-120 Low-rank polynomial chaos expansion ----------------------------------- For any :math:`\vect{\alpha} \in \mathbb{N}^p`, let :math:`\|\vect{\alpha}\|_0` be the rank of the multi-index, that is, the number of non-zero components: .. math:: \|\vect{\alpha}\|_0 = \sum_{i = 1}^p \vect{1}_{\alpha_i > 0} where :math:`\vect{1}` is the indicator function. The multi-index set of maximum total degree :math:`d \in \mathbb{N}` and maximum rank :math:`j \in \mathbb{N}` is ([blatman2009]_ page 74): .. math:: \mathcal{A}^{d,j} = \left\{ \vect{\alpha} \in \mathbb{N}^p \; | \; \|\vect{\alpha}\|_1 \leq d, \; \; \|\vect{\alpha}\|_0 \leq j\right\}. Therefore, the rank-`j` polynomial chaos expansion is: .. math:: \widetilde{h}(\vect{z}) = \sum_{\vect{\alpha} \in \mathcal{A}^{d,j}} a_{\vect{\alpha}} \psi_{\vect{\alpha}}(\vect{z}). The rank is now a hyperparameter of the model: [blatman2009]_ suggests to use :math:`j = 2, 3, 4`. An example of low-rank PCE for the G-Sobol' function is given in [blatman2009]_ page 75. *Note.* It is currently not possible to create a low-rank PCE. .. GENERATED FROM PYTHON SOURCE LINES 123-176 Model selection --------------- If :math:`\textrm{Card}\left(\mathcal{A}^{d}\right)` is large, many coefficients may be poorly estimated, which may reduce the quality of the metamodel. We may want to select a subset of the coefficients which best predict the output. In other words, we may compute a subset: .. math:: \mathcal{A} \subseteq \mathcal{A}^{d} such that ([blatman2009]_ page 86) : .. math:: \widetilde{h}(\vect{z}) = \sum_{\vect{\alpha} \in \mathcal{A}} a_{\vect{\alpha}} \psi_{\vect{\alpha}}(\vect{z}) An enumeration rule is a function from the set of integers :math:`k` to the corresponding set of multi-indices :math:`\vect{\alpha}`. More precisely, let :math:`r : \mathbb{N} \rightarrow \mathbb{N}^p` be the function such that : .. math:: r(k) = \vect{\alpha} for any :math:`k \geq 0`. Let :math:`K \in \mathbb{N}` be a parameter representing the number of coefficients considered in the selection. Given an enumeration rule for the multi-indices :math:`\vect{\alpha}`, at most :math:`K` multi-indices will be considered. Let :math:`\mathcal{A}_K` be the corresponding multi-index set : .. math:: \mathcal{A}_K = \left\{ \vect{\alpha} \; | \; r^{-1}(\vect{\alpha}) = k \leq K \right\}. Let :math:`\epsilon > 0` be a parameter representing the minimum relative value of a significant coefficient :math:`a_{\vect{\alpha}}`. The :class:`~openturns.CleaningStrategy` uses the following criterion to select the coefficients : .. math:: \mathcal{A}_\epsilon = \left\{ |a_{\vect{\alpha}}| \geq \epsilon \max_{ a_{\vect{\alpha}} \in \mathcal{A}_K } |a_{\vect{\alpha}}| \right\} where :math:`\epsilon` is the significance factor, which by default is :math:`\epsilon = 10^{-4}`. This rule selects only the coefficients which are significantly different from zero. .. GENERATED FROM PYTHON SOURCE LINES 179-209 Sparsity index -------------- The sparsity index of a multi-index set is the ratio of the cardinality of the multi-index set to the cardinality of the multi-index set of the equivalent multi-index with maximum total degree. For a given multi-index set :math:`\mathcal{A}`, let :math:`d` be the maximum 1-norm of multi-indices in the set : .. math:: d := \textrm{max}_{\vect{\alpha} \in \mathcal{A}} \|\vect{\alpha}\|_1. The index of sparsity of :math:`\mathcal{A}` is ([blatman2009]_ eq. 4.42 page 86) : .. math:: \textrm{IS}(\mathcal{A}) = \frac{\textrm{card}(\mathcal{A})}{\textrm{card}\left(\mathcal{A}^d\right)}. *Note.* The index of sparsity as defined by [blatman2009]_ is close to zero when the model is very sparse. The following complementary indicator is close to 1 when the model is very sparse: .. math:: \textrm{IS}_{\textrm{c}}(\mathcal{A}) = 1 - \frac{\textrm{card}(\mathcal{A})}{\textrm{card}\left(\mathcal{A}^d\right)}. .. GENERATED FROM PYTHON SOURCE LINES 212-218 .. code-block:: Python import openturns as ot import openturns.viewer as otv from openturns.usecases import ishigami_function import itertools .. GENERATED FROM PYTHON SOURCE LINES 219-222 The following function takes a polynomial chaos result as input and prints a given maximum number of coefficients of this polynomial. It can take into account a threshold, so that we can avoid to print coefficients which are very close to zero. .. GENERATED FROM PYTHON SOURCE LINES 226-262 .. code-block:: Python def printCoefficientsTable( polynomialChaosResult, maximum_number_of_printed_coefficients=10, threshold=0.0 ): """ Print the coefficients of the polynomial chaos. Parameters ---------- polynomialChaosResult : ot.FunctionalChaosResult The polynomial chaos expansion. maximum_number_of_printed_coefficients : int The maximum number of printed coefficients. threshold : float, strictly positive If a coefficient has an absolute value strictly greater than the threshold, it is printed. """ basis = polynomialChaosResult.getOrthogonalBasis() coefficients = polynomialChaosResult.getCoefficients() enumerate_function = basis.getEnumerateFunction() indices = polynomialChaosResult.getIndices() nbcoeffs = indices.getSize() print("Total number of coefficients : ", nbcoeffs) print("# Indice, Multi-indice, Degree : Value") print_index = 0 for k in range(nbcoeffs): multiindex = enumerate_function(indices[k]) degree = sum(multiindex) c = coefficients[k][0] if abs(c) > threshold: print("#%d, %s (%s) : %s" % (k, multiindex, degree, c)) print_index += 1 if print_index > maximum_number_of_printed_coefficients: break return .. GENERATED FROM PYTHON SOURCE LINES 263-266 The next function computes the polynomial chaos :math:`R^2` score using simple validation on a test sample generated by Monte-Carlo sampling. The actual computation is performed by the :class:`~openturns.MetaModelValidation` class. .. GENERATED FROM PYTHON SOURCE LINES 270-303 .. code-block:: Python def compute_polynomial_chaos_R2( polynomialchaos_result, g_function, input_distribution, n_valid=1000 ): """ Compute the R2 score of the polynomial chaos. Parameters ---------- polynomialChaosResult : ot.FunctionalChaosResult The polynomial chaos expansion. g_function : ot.Function The function. input_distribution : ot.Distribution The input distribution. n_valid : int The number of simulations to compute the R2 score. Returns ------- r2Score : float The R2 score """ ot.RandomGenerator.SetSeed(1976) metamodel = polynomialchaos_result.getMetaModel() inputTest = input_distribution.getSample(n_valid) outputTest = g_function(inputTest) metamodelPredictions = metamodel(inputTest) val = ot.MetaModelValidation(outputTest, metamodelPredictions) r2Score = val.computeR2Score()[0] return r2Score .. GENERATED FROM PYTHON SOURCE LINES 304-307 The following function creates a validation plot using the :meth:`~openturns.MetaModelValidation.drawValidation` method of the :class:`~openturns.MetaModelValidation` class. .. GENERATED FROM PYTHON SOURCE LINES 311-348 .. code-block:: Python def draw_polynomial_chaos_validation( polynomialchaos_result, g_function, input_distribution, n_valid=1000 ): """ Validate the polynomial chaos. Create the validation plot. Parameters ---------- polynomialChaosResult : ot.FunctionalChaosResult The polynomial chaos expansion. g_function : ot.Function The function. input_distribution : ot.Distribution The input distribution. n_valid : int The number of simulations to compute the R2 score. Returns ------- view : ot.View The plot. """ metamodel = polynomialchaos_result.getMetaModel() inputTest = input_distribution.getSample(n_valid) outputTest = g_function(inputTest) metamodelPredictions = metamodel(inputTest) val = ot.MetaModelValidation(outputTest, metamodelPredictions) r2Score = val.computeR2Score()[0] graph = val.drawValidation() graph.setTitle("R2=%.2f%%" % (r2Score * 100)) view = otv.View(graph, figure_kw={"figsize": (5.0, 4.0)}) return view .. GENERATED FROM PYTHON SOURCE LINES 349-355 We consider the Ishigami model. Its three inputs are i.i.d. random variables that follow the uniform distribution on the :math:`[-\pi, \pi]` interval. This is an interesting example for our purpose because it is highly non-linear, so that a high polynomial degree will be required in order to produce a polynomial chaos expansion with :math:`R^2` score sufficiently close to 1. .. GENERATED FROM PYTHON SOURCE LINES 358-364 .. code-block:: Python im = ishigami_function.IshigamiModel() im.inputDistribution.setDescription(["`X_0`", "`X_1`", "`X_2`"]) im.model.setOutputDescription(["`Y`"]) print(im.inputDistribution) .. rst-class:: sphx-glr-script-out .. code-block:: none JointDistribution(Uniform(a = -3.14159, b = 3.14159), Uniform(a = -3.14159, b = 3.14159), Uniform(a = -3.14159, b = 3.14159), IndependentCopula(dimension = 3)) .. GENERATED FROM PYTHON SOURCE LINES 365-376 Then we create the multivariate basis onto which the function is expanded. By default, it is associated with the linear enumeration rule. Since our marginals are uniform, the :class:`~openturns.OrthogonalProductPolynomialFactory` class produces Legendre polynomials. In order to create the multivariate basis of polynomials, we must specify the number of functions in the basis. In this particular case, we compute that number depending on the total degree. The :meth:`~openturns.EnumerateFunction.getMaximumDegreeStrataIndex` method of the enumeration function computes the number of layers necessary to achieve that total degree. Then the number of functions up to that layer is computed with the :meth:`~openturns.EnumerateFunction.getStrataCumulatedCardinal` method. .. GENERATED FROM PYTHON SOURCE LINES 379-393 .. code-block:: Python dimension = im.inputDistribution.getDimension() multivariateBasis = ot.OrthogonalProductPolynomialFactory( [im.inputDistribution.getMarginal(i) for i in range(dimension)] ) totalDegree = 5 # Polynomial degree enumerate_function = multivariateBasis.getEnumerateFunction() strataIndex = enumerate_function.getMaximumDegreeStrataIndex(totalDegree) print("strataIndex = ", strataIndex) number_of_terms_in_basis = enumerate_function.getStrataCumulatedCardinal(strataIndex) print("number_of_terms_in_basis = ", number_of_terms_in_basis) adaptiveStrategy = ot.FixedStrategy(multivariateBasis, number_of_terms_in_basis) print(adaptiveStrategy) .. rst-class:: sphx-glr-script-out .. code-block:: none strataIndex = 5 number_of_terms_in_basis = 56 class=FixedStrategy derived from class=AdaptiveStrategyImplementation maximumDimension=56 .. GENERATED FROM PYTHON SOURCE LINES 394-401 We compute the coefficients using a multivariate tensor product Gaussian quadrature rule. Since the coefficients are computed in the standardized space, we first use the `getMeasure` method of the multivariate basis in order to get that standardized distribution. Then we use the :class:`~openturns.GaussProductExperiment` class to create the quadrature, using 6 nodes on each of the dimensions. .. GENERATED FROM PYTHON SOURCE LINES 404-415 .. code-block:: Python standard_distribution = multivariateBasis.getMeasure() print(standard_distribution) marginal_number_of_nodes = 6 dim_input = im.model.getInputDimension() marginalDegrees = [marginal_number_of_nodes] * dim_input experiment = ot.GaussProductExperiment(im.inputDistribution, marginalDegrees) X, W = experiment.generateWithWeights() Y = im.model(X) print("Sample size = ", X.getSize()) .. rst-class:: sphx-glr-script-out .. code-block:: none JointDistribution(Uniform(a = -1, b = 1), Uniform(a = -1, b = 1), Uniform(a = -1, b = 1), IndependentCopula(dimension = 3)) Sample size = 216 .. GENERATED FROM PYTHON SOURCE LINES 416-421 We see that 216 nodes are involved in this quadrature rule, which is the result of :math:`6^3 = 216`. In the next cell, we compute the coefficients of the polynomial chaos expansion using integration. .. GENERATED FROM PYTHON SOURCE LINES 424-431 .. code-block:: Python projectionStrategy = ot.IntegrationStrategy() chaosalgo = ot.FunctionalChaosAlgorithm( X, W, Y, im.inputDistribution, adaptiveStrategy, projectionStrategy ) chaosalgo.run() result = chaosalgo.getResult() .. GENERATED FROM PYTHON SOURCE LINES 432-436 We now validate the metamodel by drawing the validation graph. We see that many points are close to the red test line, which indicates that the predictions of the polynomial chaos expansion are close to the output observations from the model. .. GENERATED FROM PYTHON SOURCE LINES 439-441 .. code-block:: Python view = draw_polynomial_chaos_validation(result, im.model, im.inputDistribution) .. image-sg:: /auto_meta_modeling/polynomial_chaos_metamodel/images/sphx_glr_plot_chaos_cleaning_strategy_001.png :alt: R2=85.82% :srcset: /auto_meta_modeling/polynomial_chaos_metamodel/images/sphx_glr_plot_chaos_cleaning_strategy_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 442-444 In order to have a closer look on the result, we use the `printCoefficientsTable` function in order to print the first 10 coefficients. .. GENERATED FROM PYTHON SOURCE LINES 447-450 .. code-block:: Python printCoefficientsTable(result) .. rst-class:: sphx-glr-script-out .. code-block:: none Total number of coefficients : 56 # Indice, Multi-indice, Degree : Value #0, [0,0,0] (0) : 3.5049783626206317 #1, [1,0,0] (1) : 1.6254291279668378 #2, [0,1,0] (1) : 9.189697613987136e-16 #3, [0,0,1] (1) : -1.7737547541862853e-16 #4, [2,0,0] (2) : -2.7590776885411117e-15 #5, [1,1,0] (2) : -6.375108774214766e-16 #6, [1,0,1] (2) : -1.8240617349896127e-15 #7, [0,2,0] (2) : -0.6413919098131448 #8, [0,1,1] (2) : -5.117434254131581e-17 #9, [0,0,2] (2) : -1.4085954624931674e-15 #10, [3,0,0] (3) : -1.2911039486376805 .. GENERATED FROM PYTHON SOURCE LINES 451-457 We see that there are 56 coefficients in the metamodel and that many of these coefficients are close to zero. If we print only coefficients greater than :math:`10^{-14}`, we see that only a fraction of them are significant and that these significant coefficients have a relatively large polynomial degree. .. GENERATED FROM PYTHON SOURCE LINES 460-463 .. code-block:: Python printCoefficientsTable(result, threshold=1.0e-14) .. rst-class:: sphx-glr-script-out .. code-block:: none Total number of coefficients : 56 # Indice, Multi-indice, Degree : Value #0, [0,0,0] (0) : 3.5049783626206317 #1, [1,0,0] (1) : 1.6254291279668378 #7, [0,2,0] (2) : -0.6413919098131448 #10, [3,0,0] (3) : -1.2911039486376805 #15, [1,0,2] (3) : 1.3724300360555144 #30, [0,4,0] (4) : -1.6122740106725177 #35, [5,0,0] (5) : 0.20760614724812593 #40, [3,0,2] (5) : -1.0901427864755082 #49, [1,0,4] (5) : 0.40917958066434623 .. GENERATED FROM PYTHON SOURCE LINES 464-466 The previous experiments suggest to keep only the coefficients which are significant in the model: this is the topic of the next section. .. GENERATED FROM PYTHON SOURCE LINES 469-471 Use a model selection method ---------------------------- .. GENERATED FROM PYTHON SOURCE LINES 475-485 The `CleaningStrategy` has the following algorithm. On input, it considers only the first `maximumConsideredTerms` coefficients :math:`a_{\vect{\alpha}}`. On output it selects the `mostSignificant` most significant coefficients. To do this, it uses the `significanceFactor` parameter. The following function will help to create a sparse PCE using the `CleaningStrategy`. It takes into account the number of considered coefficients in the expansion, the number of significant coefficients to keep and the relative factor and returns the :math:`R^2` score. .. GENERATED FROM PYTHON SOURCE LINES 489-535 .. code-block:: Python def compute_cleaning_PCE( maximumConsideredTerms, mostSignificant, significanceFactor, verbose=False ): """ Compute a PCE using CleaningStrategy. Parameters ---------- maximumConsideredTerms : int The maximum number of coefficients considered by the algorithm during intermediate steps. mostSignificant : int The maximum number of coefficients selected by the algorithm in the final PCE. significanceFactor : float The relative part of any coefficient with respect to the maximum coefficient. verbose : bool If set to True, print intermediate messages. Returns ------- R2 : float The R2 score """ adaptiveStrategy = ot.CleaningStrategy( multivariateBasis, maximumConsideredTerms, mostSignificant, significanceFactor, ) chaosalgo = ot.FunctionalChaosAlgorithm( X, W, Y, im.inputDistribution, adaptiveStrategy, projectionStrategy ) chaosalgo.run() result = chaosalgo.getResult() score_R2 = compute_polynomial_chaos_R2(result, im.model, im.inputDistribution) if verbose: print("R2 = %.2f%%" % (100.0 * score_R2)) printCoefficientsTable(result) return score_R2 .. GENERATED FROM PYTHON SOURCE LINES 536-538 In the next cell, we consider at most 500 coefficients and keep only the 5 most significant coefficients. The factor is set to a relatively low value. .. GENERATED FROM PYTHON SOURCE LINES 541-549 .. code-block:: Python maximumConsideredTerms = 500 mostSignificant = 5 significanceFactor = 1.0e-10 score_R2 = compute_cleaning_PCE( maximumConsideredTerms, mostSignificant, significanceFactor, verbose=True ) .. rst-class:: sphx-glr-script-out .. code-block:: none R2 = -384.39% Total number of coefficients : 6 # Indice, Multi-indice, Degree : Value #0, [0,0,0] (0) : 3.5049783626206317 #1, [11,0,0] (11) : -1.6603684437609167 #2, [12,0,0] (12) : -4.269723903959838 #3, [0,12,0] (12) : -4.133953684206107 #4, [0,0,12] (12) : -4.269723903959839 #5, [5,0,8] (13) : -0.052471216285749106 .. GENERATED FROM PYTHON SOURCE LINES 550-557 We see that when we keep only 5 coefficients among the first 500 ones, these coefficients have a very high polynomial degree. Indeed, it occurs that these poorly estimated coefficients have a high absolute value. Hence, the criteria selects them as significant coefficients, which leads to a poor metamodel. Let us reduce the number of considered coefficients and increase the number of selected coefficients. .. GENERATED FROM PYTHON SOURCE LINES 560-568 .. code-block:: Python maximumConsideredTerms = 56 mostSignificant = 10 significanceFactor = 1.0e-10 score_R2 = compute_cleaning_PCE( maximumConsideredTerms, mostSignificant, significanceFactor, verbose=True ) .. rst-class:: sphx-glr-script-out .. code-block:: none R2 = 85.75% Total number of coefficients : 9 # Indice, Multi-indice, Degree : Value #0, [0,0,0] (0) : 3.5049783626206317 #1, [1,0,0] (1) : 1.6254291279668378 #2, [0,2,0] (2) : -0.6413919098131448 #3, [3,0,0] (3) : -1.2911039486376805 #4, [1,0,2] (3) : 1.3724300360555144 #5, [0,4,0] (4) : -1.6122740106725177 #6, [5,0,0] (5) : 0.20760614724812593 #7, [3,0,2] (5) : -1.0901427864755082 #8, [1,0,4] (5) : 0.40917958066434623 .. GENERATED FROM PYTHON SOURCE LINES 569-581 When we keep only 10 coefficients among the first 56 ones, the polynomial chaos metamodel is much better: the coefficients are associated with a low polynomial degree, so that the quadrature rule estimates them with greater accuracy. We would like to know which combination is the best. In the following loop, we consider the maximum number of considered coefficients from 1 to 500 and the number of selected coefficients from 1 to 30. In order to produce the combinations, we use the `product` function from the `itertools` module. For each combination, we compute the :math:`R^2` score and select the combination with the highest :math:`R^2` coefficient. As shown in [muller2016]_ page 268, the computed :math:`R^2` may be optimistic, but this is not the point of the current example. .. GENERATED FROM PYTHON SOURCE LINES 585-606 .. code-block:: Python maximumConsideredTerms_list = list(range(1, 500, 50)) mostSignificant_list = list(range(1, 30, 5)) iterator = itertools.product(maximumConsideredTerms_list, mostSignificant_list) best_R2_score = 0.0 best_parameters = [] for it in iterator: maximumConsideredTerms, mostSignificant = it score_R2 = compute_cleaning_PCE( maximumConsideredTerms, mostSignificant, significanceFactor ) if score_R2 > best_R2_score: best_R2_score = score_R2 best_parameters = [maximumConsideredTerms, mostSignificant] print("Best R2 = %.2f%%" % (100.0 * best_R2_score)) maximumConsideredTerms, mostSignificant = best_parameters print("Number of considered coefficients : ", maximumConsideredTerms) print("Number of selected coefficients : ", mostSignificant) .. rst-class:: sphx-glr-script-out .. code-block:: none Best R2 = 86.15% Number of considered coefficients : 101 Number of selected coefficients : 16 .. GENERATED FROM PYTHON SOURCE LINES 607-610 We see that the best solution could be to select at most 16 significant coefficients among the first 101 ones. Let us see the :math:`R^2` score and the coefficients in this situation. .. GENERATED FROM PYTHON SOURCE LINES 613-618 .. code-block:: Python score_R2 = compute_cleaning_PCE( maximumConsideredTerms, mostSignificant, significanceFactor, verbose=True ) .. rst-class:: sphx-glr-script-out .. code-block:: none R2 = 86.15% Total number of coefficients : 12 # Indice, Multi-indice, Degree : Value #0, [0,0,0] (0) : 3.5049783626206317 #1, [1,0,0] (1) : 1.6254291279668378 #2, [0,2,0] (2) : -0.6413919098131448 #3, [3,0,0] (3) : -1.2911039486376805 #4, [1,0,2] (3) : 1.3724300360555144 #5, [0,4,0] (4) : -1.6122740106725177 #6, [5,0,0] (5) : 0.20760614724812593 #7, [3,0,2] (5) : -1.0901427864755082 #8, [1,0,4] (5) : 0.40917958066434623 #9, [7,0,0] (7) : -0.2077986425581342 #10, [5,0,2] (7) : 0.1752921165560066 .. GENERATED FROM PYTHON SOURCE LINES 619-623 These parameters lead to a total number of coefficients equal to 12. Among the 16 most significant coefficients, only 12 satisfy the criteria. Most of the coefficients have a small polynomial degree although some have a total degree as large as 7. .. GENERATED FROM PYTHON SOURCE LINES 626-653 Intermediate steps of the algorithm ----------------------------------- If we set the `verbose` optional input argument of the `compute_cleaning_PCE` function to `True`, then intermediate messages are printed in the Terminal. For each step of the adaptivity algorithm, the code prints some of the internal parameters of the algorithm. The data structure uses several variables that we now describe. - `Psi_k_p_` : the collection of functions in the current active polynomial multi-index set, - `I_p_` : the list of indices of the selected coefficients based according to the enumeration rule, - `addedPsi_k_ranks_` : the list of indices to add the multi-index set, - `removedPsi_k_ranks_` : the list of indices to remove to the multi-index set, - `conservedPsi_k_ranks_` : the index of the first polynomial in the selected multi-index set, - `currentVectorIndex_` : the current value of the index in the full multi-index set, according to the enumeration rule. Each time the selection method is called, it is passed a coefficient :math:`a_{\vect{\alpha}}` which is a new candidate to be considered by the algorithm. The first time the method is evaluated, the active multi-index set is empty, so that it must be filled with the first coefficients in the multi-index set, according to the enumeration rule. The second time (and up to the end of the algorithm), the candidate coefficient is considered to be added to the multi-index set. Executing the function prints messages that we can process to produce the following listing. On each step, we print the list of integers corresponding to the indices of the coefficients in the active multi-index set. .. GENERATED FROM PYTHON SOURCE LINES 656-692 .. code-block:: python Step 1: [0, 1, 7, 10, 15, 16] Step 2: [0, 1, 7, 10, 15, 17] Step 3: [0, 1, 7, 10, 15, 18] Step 4: [0, 1, 7, 10, 15, 19] Step 5: [0, 1, 7, 10, 15, 20] Step 6: [0, 1, 7, 10, 15, 21] Step 7: [0, 1, 7, 10, 15, 22] Step 8: [0, 1, 7, 10, 15, 23] Step 9: [0, 1, 7, 10, 15, 24] Step 10: [0, 1, 7, 10, 15, 25] [...] Step 15: [0, 1, 7, 10, 15, 30] Step 16: [0, 1, 7, 10, 15, 30, 31] [...] Step 20: [0, 1, 7, 10, 15, 30, 35] Step 21: [0, 1, 7, 10, 15, 30, 35, 36] [...] Step 25: [0, 1, 7, 10, 15, 30, 35, 40] Step 26: [0, 1, 7, 10, 15, 30, 35, 40, 41] [...] Step 34: [0, 1, 7, 10, 15, 30, 35, 40, 49] Step 35: [0, 1, 7, 10, 15, 30, 35, 40, 49, 50] [...] Step 69: [0, 1, 7, 10, 15, 30, 35, 40, 49, 84] Step 70: [0, 1, 7, 10, 15, 30, 35, 40, 49, 84, 85] [...] Step 74: [0, 1, 7, 10, 15, 30, 35, 40, 49, 84, 89] Step 75: [0, 1, 7, 10, 15, 30, 35, 40, 49, 84, 89, 90] [...] Step 83: [0, 1, 7, 10, 15, 30, 35, 40, 49, 84, 89, 98] Step 84: [0, 1, 7, 10, 15, 30, 35, 40, 49, 84, 89, 98, 99] Step 85: [0, 1, 7, 10, 15, 30, 35, 40, 49, 84, 89, 98, 100] Step 86: [0, 1, 7, 10, 15, 30, 35, 40, 49, 84, 89, 98] Step 87: [0, 1, 7, 10, 15, 30, 35, 40, 49, 84, 89, 98] .. GENERATED FROM PYTHON SOURCE LINES 695-738 The previous text (and a detailed analysis of the output file) allows one to understand what exactly happens in the algorithm. To understand each step, note that the significant threshold is equal to :math:`\epsilon = 10^{-10}`. - During the initialization, the initial basis is empty and filled with the indices [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]. The coefficient with index equal to 0 corrresponds to the mean of the polynomial chaos expansion. As a result, we always want to select that coefficient in the active set. Hence, the calculation of the largest coefficient in absolute value is performed from the indices from 1 to 15. The largest coefficient in absolute value is :math:`a_1 = 1.625`, which leads to the threshold :math:`\epsilon |a_1| = 1.625 \times 10^{-10}`. Most of the considered coefficient are, however, too close to zero. This is why only the coefficients [0, 1, 7, 10, 15] are kept in the basis. The corresponding coefficients are [3.505, 1.625,-0.6414, -1.291, 1.372]. - On step 1, the candidate index 16 is considered. Its coefficient is :math:`a_{16} = -1.197 \times 10^{-15}`, which is much too low to be selected. Hence, the basis is unchanged and the active multi-index set is [0, 1, 7, 10, 15] on the end of this step. - From the step 3 to the step 15, the active multi-index set is unchanged, because no considered coefficient becomes greater than the threshold. - On step 16, the candidate index 30 is considered, with corresponding coefficient :math:`a_{30} = -1.612`. Since this coefficient has an absolute value greater than the threshold, it gets selected and the active multi-index set is [0, 1, 7, 10, 15, 30] on the end of this step. From this step to the end, the index 30 will not leave the active set. - On the step 20, the index 35 enters the active set. - On the step 25, the index 40 enters the active set. - On the step 34, the index 49 enters the active set. - On the step 69, the index 84 enters the active set. - On the step 74, the index 89 enters the active set. - On the step 83, the index 98 enters the active set. - On the last step, the active multi-index set contains the indices [0, 1, 7, 10, 15, 30, 35, 40, 49, 84, 89, 98] and the corresponding coefficients are [3.508, 1.625, -0.6414, -1.291, 1.372, -1.613, 0.2076, -1.090, 0.4092, -0.2078, 0.1753, -0.3250]. We see that the algorithm was able so select 12 coefficients in the first 101 coefficients considered by the algorithm. It could have selected more coefficients since we provided 16 slots to fill thanks to the `mostSignificant` parameter. The considered coefficients were, however, too close to zero and were below the threshold. .. GENERATED FROM PYTHON SOURCE LINES 742-748 Conclusion ---------- We see that the :class:`~openturns.CleaningStrategy` class performs correctly in this particular case. We have seen how to select the hyperparameters which produce the best :math:`R^2` score. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 2.266 seconds) .. _sphx_glr_download_auto_meta_modeling_polynomial_chaos_metamodel_plot_chaos_cleaning_strategy.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_chaos_cleaning_strategy.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_chaos_cleaning_strategy.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_chaos_cleaning_strategy.zip `