Sample manipulation

This example will describe the main statistical functionalities on data through the Sample object. The Sample is an output variable of interest.

import openturns as ot

ot.Log.Show(ot.Log.NONE)

A typical example

A recurring issue in uncertainty quantification is to perform analysis on an output variable of interest Y obtained through a model f and input parameters X. Here we shall consider the input parameters as two independent standard Normal distributions X=(X_1, X_2). We therefore use an IndependentCopula to describe the link between the two marginals.

# input parameters
inputDist = ot.JointDistribution([ot.Normal()] * 2, ot.IndependentCopula(2))
inputDist.setDescription(["X1", "X2"])

We create a vector from the 2d-distribution created before :

inputVector = ot.RandomVector(inputDist)

Suppose our model f is known and reads as :

f(x) = \begin{pmatrix}
         x_1^2 + x_2 \\
         x_1   + x_2^2
       \end{pmatrix}

We define our model f with a SymbolicFunction

f = ot.SymbolicFunction(["x1", "x2"], ["x1^2+x2", "x2^2+x1"])

Our output vector is Y = f(X), the image of the inputVector by the model

outputVector = ot.CompositeRandomVector(f, inputVector)

We can now get a sample out of Y, that is realizations (here 1000) of the random outputVector

size = 1000
sample = outputVector.getSample(size)

The sample may be seen as a matrix of size 1000 \times 2. We print the 5 first samples (out of 1000) :

sample[:5]
y0y1
00.1379059-0.4635553
1-1.1622321.774809
23.782601-0.1532537
31.126566-1.013287
43.629642.332004


Basic operations on samples

We have access to basic information about a sample such as

  • minimum and maximum per component

sample.getMin(), sample.getMax()
(class=Point name=Unnamed dimension=2 values=[-3.24513,-2.98342], class=Point name=Unnamed dimension=2 values=[10.6987,10.5037])
  • the range per component (max-min)

sample.computeRange()
class=Point name=Unnamed dimension=2 values=[13.9438,13.4871]


More elaborate functionalities are also available :

  • get the median per component

sample.computeMedian()
class=Point name=Unnamed dimension=2 values=[0.620644,0.710843]


  • compute the covariance

sample.computeCovariance()

[[ 2.78241 0.104519 ]
[ 0.104519 3.29309 ]]



  • get the empirical 0.95 quantile per component

sample.computeQuantilePerComponent(0.95)
class=Point name=Unnamed dimension=2 values=[3.96521,4.44277]


  • get the value of the empirical CDF at a point

point = [1.1, 2.2]
sample.computeEmpiricalCDF(point)
0.571

Estimate the statistical moments

Oftentimes, we need to estimate the first moments of the output data. We can then estimate statistical moments from the output sample :

  • estimate the moment of order 1 : mean

sample.computeMean()
class=Point name=Unnamed dimension=2 values=[0.922935,1.01288]


  • estimate the standard deviation for each component

sample.computeStandardDeviation()
class=Point name=Unnamed dimension=2 values=[1.66805,1.81469]


  • estimate the moment of order 2 : variance

sample.computeVariance()
class=Point name=Unnamed dimension=2 values=[2.78241,3.29309]


  • estimate the moment of order 3 : skewness

sample.computeSkewness()
class=Point name=Unnamed dimension=2 values=[1.4272,1.74336]


  • estimate the moment of order 4 : kurtosis

sample.computeKurtosis()
class=Point name=Unnamed dimension=2 values=[6.85985,8.03257]


Test the correlation

Some statistical test for correlation are available :

  • get the sample linear correlation matrix :

sample.computeLinearCorrelation()

[[ 1 0.034529 ]
[ 0.034529 1 ]]



  • get the sample Kendall correlation matrix :

sample.computeKendallTau()

[[ 1 -0.0121522 ]
[ -0.0121522 1 ]]



  • get the sample Spearman correlation matrix :

sample.computeSpearmanCorrelation()

[[ 1 0.0037072 ]
[ 0.0037072 1 ]]