.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_numerical_methods/general_methods/plot_regression_sinus.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_numerical_methods_general_methods_plot_regression_sinus.py: Compute confidence intervals of a univariate noisy function =========================================================== .. GENERATED FROM PYTHON SOURCE LINES 7-23 Introduction ------------ In this example, we compute the pointwise confidence interval of the estimator of the conditional expectation given an input. We consider noisy observations of the sine function. Then we perform linear least squares regression to fit an order 4 polynomial. For a given point x, the code computes the confidence interval of the prediction y. This is the confidence interval of the conditional expectation given the input. Secondly, we compute the confidence interval of the noisy output observations. In this advanced example, we use the :class:`~openturns.QRMethod` low level class. Another example of this method is presented in :doc:`/auto_numerical_methods/general_methods/plot_regression_interval`. .. GENERATED FROM PYTHON SOURCE LINES 26-35 .. code-block:: Python import openturns as ot import openturns.viewer as otv import numpy as np palette = ot.Drawable.BuildTableauPalette(5) .. GENERATED FROM PYTHON SOURCE LINES 36-41 Compute the data ---------------- We generate noisy observations from the sine function. We define the function that we are going to approximate. .. GENERATED FROM PYTHON SOURCE LINES 45-47 .. code-block:: Python g = ot.SymbolicFunction(["x"], ["sin(2 * pi_ * x)"]) .. GENERATED FROM PYTHON SOURCE LINES 48-49 We plot the function depending on the input. .. GENERATED FROM PYTHON SOURCE LINES 49-64 .. code-block:: Python def plotFunction(g, color, lineStyle="dotted"): curve = g.draw(0.0, 1.0).getDrawable(0) curve.setColor(color) curve.setLineStyle("dotted") curve.setLegend("True") return curve graph = ot.Graph("Polynomial curve fitting", "x", "y", True, "upper right") # The "unknown" function graph.add(plotFunction(g, palette[0])) view = otv.View(graph) .. image-sg:: /auto_numerical_methods/general_methods/images/sphx_glr_plot_regression_sinus_001.svg :alt: Polynomial curve fitting :srcset: /auto_numerical_methods/general_methods/images/sphx_glr_plot_regression_sinus_001.svg :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 65-67 This is a nice, smooth function to approximate with polynomials. .. GENERATED FROM PYTHON SOURCE LINES 67-78 .. code-block:: Python def linearSample(xmin, xmax, npoints): """Returns a sample created from a regular grid from xmin to xmax with npoints points.""" step = (xmax - xmin) / (npoints - 1) rg = ot.RegularGrid(xmin, step, npoints) vertices = rg.getVertices() return vertices .. GENERATED FROM PYTHON SOURCE LINES 79-80 We consider observation points in the interval [0,1]. .. GENERATED FROM PYTHON SOURCE LINES 80-85 .. code-block:: Python nTrain = 100 xTrain = linearSample(0, 1, nTrain) .. GENERATED FROM PYTHON SOURCE LINES 86-88 Assume that the observations are noisy and that the noise follows a Normal distribution with zero mean and small standard deviation. .. GENERATED FROM PYTHON SOURCE LINES 88-92 .. code-block:: Python noise = ot.Normal(0, 0.5) noiseSample = noise.getSample(nTrain) .. GENERATED FROM PYTHON SOURCE LINES 93-97 The following code computes the observation as the sum of the function value and of the noise. The couple `(xTrain,yTrain)` is the training set: it is used to compute the coefficients of the polynomial model. .. GENERATED FROM PYTHON SOURCE LINES 97-102 .. code-block:: Python yTrain = g(xTrain) + noiseSample print(yTrain[:5]) .. rst-class:: sphx-glr-script-out .. code-block:: none [ y0 ] 0 : [ 0.304101 ] 1 : [ -0.569663 ] 2 : [ -0.0925404 ] 3 : [ 0.79199 ] 4 : [ -0.839545 ] .. GENERATED FROM PYTHON SOURCE LINES 103-104 Then we plot the function and the observations. .. GENERATED FROM PYTHON SOURCE LINES 104-121 .. code-block:: Python def plotData(xTrain, yTrain, color, pointStyle="circle"): cloud = ot.Cloud(xTrain, yTrain) cloud.setPointStyle(pointStyle) cloud.setLegend("Observations") cloud.setColor(palette[1]) return cloud graph = ot.Graph("Polynomial curve fitting", "x", "y", True, "upper right") # The "unknown" function graph.add(plotFunction(g, palette[0])) # Training set graph.add(plotData(xTrain, yTrain, palette[1])) view = otv.View(graph) .. image-sg:: /auto_numerical_methods/general_methods/images/sphx_glr_plot_regression_sinus_002.svg :alt: Polynomial curve fitting :srcset: /auto_numerical_methods/general_methods/images/sphx_glr_plot_regression_sinus_002.svg :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 122-125 We see that the noisy observations of the function are relatively large compared to the function values. It may not be obvious that a regression model can fit that data well. .. GENERATED FROM PYTHON SOURCE LINES 127-130 Compute the coefficients of the polynomial decomposition -------------------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 132-136 In order to approximate the function with polynomials up to degree 4, we create a list of strings containing the associated monomials. We perform the loop from 0 up to `totalDegree` (but the `range` function takes `totalDegree + 1` as its second input argument). .. GENERATED FROM PYTHON SOURCE LINES 136-142 .. code-block:: Python totalDegree = 4 polynomialCollection = [f"x^{degree}" for degree in range(0, totalDegree + 1)] print(polynomialCollection) .. rst-class:: sphx-glr-script-out .. code-block:: none ['x^0', 'x^1', 'x^2', 'x^3', 'x^4'] .. GENERATED FROM PYTHON SOURCE LINES 143-145 Given the list of strings, we create a symbolic function which computes the values of the monomials. .. GENERATED FROM PYTHON SOURCE LINES 145-150 .. code-block:: Python basisFunction = ot.SymbolicFunction(["x"], polynomialCollection) print(basisFunction) .. rst-class:: sphx-glr-script-out .. code-block:: none [x]->[x^0,x^1,x^2,x^3,x^4] .. GENERATED FROM PYTHON SOURCE LINES 151-153 Evaluate the design matrix as the value of the basis functions on the training sample. .. GENERATED FROM PYTHON SOURCE LINES 153-158 .. code-block:: Python designSampleTrain = basisFunction(xTrain) print(designSampleTrain[:5]) .. rst-class:: sphx-glr-script-out .. code-block:: none [ y0 y1 y2 y3 y4 ] 0 : [ 1 0 0 0 0 ] 1 : [ 1 0.010101 0.00010203 1.03061e-06 1.04102e-08 ] 2 : [ 1 0.020202 0.000408122 8.24488e-06 1.66563e-07 ] 3 : [ 1 0.030303 0.000918274 2.78265e-05 8.43226e-07 ] 4 : [ 1 0.040404 0.00163249 6.5959e-05 2.66501e-06 ] .. GENERATED FROM PYTHON SOURCE LINES 159-163 Convert the design sample into a design matrix and create an instance of the :class:`~openturns.QRMethod` class. This class has the :meth:`~openturns.QRMethod.getGramInverse` method that we need to compute the confidence interval. .. GENERATED FROM PYTHON SOURCE LINES 163-167 .. code-block:: Python designMatrixTrain = ot.Matrix(designSampleTrain) lsqMethod = ot.QRMethod(designMatrixTrain) .. GENERATED FROM PYTHON SOURCE LINES 168-169 Solve the linear least squares problem and get the vector of coefficients. .. GENERATED FROM PYTHON SOURCE LINES 169-173 .. code-block:: Python betaHat = lsqMethod.solve(yTrain.asPoint()) print(betaHat) .. rst-class:: sphx-glr-script-out .. code-block:: none [-0.193699,9.71586,-20.9196,-0.667074,12.5351] .. GENERATED FROM PYTHON SOURCE LINES 174-180 Compute residuals and variance ------------------------------ We need to estimate the variance of the residuals. To do this we evaluate the predictions of the regression model on the training sample and compute the residuals. .. GENERATED FROM PYTHON SOURCE LINES 180-191 .. code-block:: Python yHatTrain = designMatrixTrain * betaHat residuals = yHatTrain - yTrain.asPoint() sampleSize = designMatrixTrain.getNbRows() print("sampleSize=", sampleSize) nParameters = designMatrixTrain.getNbColumns() print("nParameters = ", nParameters) sigma2Hat = residuals.normSquare() / (sampleSize - nParameters) print("sigma2Hat = ", sigma2Hat) .. rst-class:: sphx-glr-script-out .. code-block:: none sampleSize= 100 nParameters = 5 sigma2Hat = 0.2519190900274913 .. GENERATED FROM PYTHON SOURCE LINES 192-196 The couple `(xTest,yHatTest)` is the set where we want to evaluate the prediction confidence intervals. In order to evaluate the predictions from the regression model, multiply the design matrix evaluated on the test sample with the vector of coefficients. .. GENERATED FROM PYTHON SOURCE LINES 196-203 .. code-block:: Python nTest = 50 xTest = linearSample(0, 1, nTest) designMatrixTest = ot.Matrix(basisFunction(xTest)) yHatTest = ot.Sample.BuildFromPoint(designMatrixTest * betaHat) .. GENERATED FROM PYTHON SOURCE LINES 204-206 Then we plot the true function, its noisy observations and the least squares model of degree 4. .. GENERATED FROM PYTHON SOURCE LINES 206-224 .. code-block:: Python def plotPredictions(xTest, yHatTest, totalDegree, color): curve = ot.Curve(xTest, yHatTest) curve.setLegend(f"L.S. degree {totalDegree}") curve.setColor(color) return curve graph = ot.Graph("Polynomial curve fitting", "x", "y", True, "upper right") # The "unknown" function graph.add(plotFunction(g, palette[0])) # Training set graph.add(plotData(xTrain, yTrain, palette[1])) # Predictions graph.add(plotPredictions(xTest, yHatTest, totalDegree, palette[2])) view = otv.View(graph) .. image-sg:: /auto_numerical_methods/general_methods/images/sphx_glr_plot_regression_sinus_003.svg :alt: Polynomial curve fitting :srcset: /auto_numerical_methods/general_methods/images/sphx_glr_plot_regression_sinus_003.svg :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 225-227 We see that the least squares polynomial model is relatively close to the true function. .. GENERATED FROM PYTHON SOURCE LINES 230-235 Compute confidence intervals ---------------------------- The next function will help to compute confidence intervals. It is based on regression analysis. .. GENERATED FROM PYTHON SOURCE LINES 235-311 .. code-block:: Python def computeRegressionConfidenceInterval( lsqMethod, betaHat, sigma2Hat, designSample, alpha=0.95, mean=True, epsilonSigma=1.0e-5, ): """ Compute a confidence interval for the estimate of the mean. Evaluates this confidence interval at points in the design matrix. This code is based on (Rawlings, Pantula & David, 1998) eq. 3.51 and 3.52 page 90. Parameters ---------- lsqMethod: ot.LeastSquaresMethod The linear least squares method (e.g. QR or Cholesky). betaHat : ot.Point(parameterDimension) The solution of the least squares problem. sigma2Hat : float > 0.0 The estimate of the variance. designSample : ot.Sample(size, parameterDimension) The design matrix of the linear model. This is the value of the functional basis depending on the input sample. Each row represents the input where the confidence interval is to be computed. alpha : float, in [0, 1] The width of the confidence interval. mean : bool If True, then computes the confidence interval of the mean. This interval contains yTrue = E[y|x] with probability alpha. Otherwise, computes a confidence interval of the prediction at point x. This interval contains y|x with probability alpha. epsilonSigma : float > 0.0 A relatively small number. The minimal value of variance, which avoids a singular Normal distribution. Reference --------- - O. Rawlings John, G. Pantula Sastry, and A. Dickey David. Applied regression analysis—a research tool. Springer New York, 1998. Returns ------- confidenceBounds : ot.Sample(size, 2) The first column contains the lower bound. The second column contains the upper bound. """ inverseGram = lsqMethod.getGramInverse() sampleSize = designSample.getSize() confidenceBounds = ot.Sample(sampleSize, 2) for i in range(sampleSize): x0 = designSample[i, :] meanYHat = x0.dot(betaHat) sigma2YHat = x0.dot(inverseGram * x0) * sigma2Hat if not mean: sigma2YHat += sigma2Hat sigmaYHat = np.sqrt(sigma2YHat) sigmaYHat = max(epsilonSigma, sigmaYHat) # Prevents a zero s.e. distribution = ot.Normal(meanYHat, sigmaYHat) interval = distribution.computeBilateralConfidenceInterval(alpha) lower = interval.getLowerBound() upper = interval.getUpperBound() confidenceBounds[i, 0] = lower[0] confidenceBounds[i, 1] = upper[0] return confidenceBounds .. GENERATED FROM PYTHON SOURCE LINES 312-314 We evaluate the value of the basis functions on the test sample. This sample is used to compute the confidence interval. .. GENERATED FROM PYTHON SOURCE LINES 316-318 .. code-block:: Python designSampleTest = basisFunction(xTest) .. GENERATED FROM PYTHON SOURCE LINES 319-320 Compute the confidence interval. .. GENERATED FROM PYTHON SOURCE LINES 322-327 .. code-block:: Python alpha = 0.95 confidenceIntervalMean = computeRegressionConfidenceInterval( lsqMethod, betaHat, sigma2Hat, designSampleTest, alpha=alpha ) .. GENERATED FROM PYTHON SOURCE LINES 328-330 On output, the `confidenceIntervalMean` variable is a :class:`~openturns.Sample` of size 50 and dimension 2. .. GENERATED FROM PYTHON SOURCE LINES 332-334 .. code-block:: Python print(confidenceIntervalMean.getSize()) .. rst-class:: sphx-glr-script-out .. code-block:: none 50 .. GENERATED FROM PYTHON SOURCE LINES 335-337 Plot the confidence interval (C.I.) of the pointwise estimator of the conditional expectation. .. GENERATED FROM PYTHON SOURCE LINES 340-375 .. code-block:: Python def plotConfidenceInterval( xTest, confidenceIntervalSample, color, lineStyle="dashed", label="" ): graph = ot.Graph() curve = ot.Curve(xTest, confidenceIntervalSample[:, 0]) curve.setLegend(label) curve.setColor(color) curve.setLineStyle(lineStyle) graph.add(curve) curve = ot.Curve(xTest, confidenceIntervalSample[:, 1]) curve.setLegend("") curve.setColor(color) curve.setLineStyle(lineStyle) graph.add(curve) return graph graph = ot.Graph("Polynomial curve fitting", "x", "y", True, "upper right") # The "unknown" function graph.add(plotFunction(g, palette[0])) # Training set graph.add(plotData(xTrain, yTrain, palette[1])) # Predictions graph.add(plotPredictions(xTest, yHatTest, totalDegree, palette[2])) # Confidence interval of the mean graph.add( plotConfidenceInterval( xTest, confidenceIntervalMean, palette[3], label="Mean %.0f%%" % (100.0 * alpha), ) ) view = otv.View(graph) .. image-sg:: /auto_numerical_methods/general_methods/images/sphx_glr_plot_regression_sinus_004.svg :alt: Polynomial curve fitting :srcset: /auto_numerical_methods/general_methods/images/sphx_glr_plot_regression_sinus_004.svg :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 376-382 We see that the pointwise confidence interval contains the true model for most points. For a small set of points, there are points which are not within the bounds, but are not too far away. The observations, however, are not contained within these bounds. This is the goal of the next cell. .. GENERATED FROM PYTHON SOURCE LINES 385-386 Finally, compute a 95% C.I. of the observations. .. GENERATED FROM PYTHON SOURCE LINES 386-398 .. code-block:: Python alpha = 0.95 confidenceIntervalObservations = computeRegressionConfidenceInterval( lsqMethod, betaHat, sigma2Hat, designSampleTest, alpha=alpha, mean=False, ) .. GENERATED FROM PYTHON SOURCE LINES 399-401 Then we plot the function, its least squares approximation, the C.I. of the mean and the C.I. of the observations. .. GENERATED FROM PYTHON SOURCE LINES 401-430 .. code-block:: Python # sphinx_gallery_thumbnail_number = 5 graph = ot.Graph("Polynomial curve fitting", "x", "y", True, "upper right") # The "unknown" function graph.add(plotFunction(g, palette[0])) # Training set graph.add(plotData(xTrain, yTrain, palette[1])) # Predictions graph.add(plotPredictions(xTest, yHatTest, totalDegree, palette[2])) # Confidence interval of the mean graph.add( plotConfidenceInterval( xTest, confidenceIntervalMean, palette[3], label="Mean %.0f%%" % (100.0 * alpha), ) ) # Confidence interval of the observations. graph.add( plotConfidenceInterval( xTest, confidenceIntervalObservations, palette[4], label="Obs. %.0f%%" % (100.0 * alpha), ) ) view = otv.View(graph) .. image-sg:: /auto_numerical_methods/general_methods/images/sphx_glr_plot_regression_sinus_005.svg :alt: Polynomial curve fitting :srcset: /auto_numerical_methods/general_methods/images/sphx_glr_plot_regression_sinus_005.svg :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 431-435 We see that the confidence interval of the observations contain most of the observations. The confidence interval of the observations is much larger than the C.I. of the mean, as expected from the statistical model. .. _sphx_glr_download_auto_numerical_methods_general_methods_plot_regression_sinus.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_regression_sinus.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_regression_sinus.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_regression_sinus.zip `