Note
Go to the end to download the full example code.
Model a singular multivariate distribution¶
The objective of the example is to show how to fit a singular copula
on a sample of dimension which might be high. For example,
this is the case when we want to fit the joint distribution of the
Karhunen Loeve
coefficients in the Karhunen Loeve modelisation of a process.
We first show that when the copula is singular, using a kernel smoothing fitting might not be a good idea.
The good way to follow is to fit an empirical Bernstein copula implemented in
EmpiricalBernsteinCopula through its factory
BernsteinCopulaFactory. This factory garantees that the built
EmpiricalBernsteinCopula is really a copula and not only a
core (ie a multivariate distribution which range is included in ).
According to the level of singularity, the number
of cells used to build the
empirical Bernstein copula differs: the higher the singularity, the larger
.
The cells partition the sample: all the points of the same cell are grouped into a single global point. The empirical Bernstein copula is a mixture of products of Beta distributions centered on the global point of each cell.
We denote by the sample size. The number of global points varies according to the number of
cells considered (math:m):
means that all the sample has been retained: we create one cell around each point of the sample; in that case, the empirical Bernstein copula is the Beta copula in the sens of [segers2016];
means the sample has been grouped into one single global point: we get the independent copula;
means that we create
cells gathering several points.
As the empirical Bernstein copula defined in this way is a copula only if divides
, the
BernsteinCopulaFactory throws away
some part of the sample is set aside in order to check this condition.
A point’s zone of influence is its cell. The larger , the smaller its zone of influence:
the final copula thus adheres strongly to the sample. The smaller
, the larger
its zone of influence, and the more the final copula will be able to fill a large
area around it.
So, if we know that the final copula is diffuse, we recommend to let the
BernsteinCopulaFactory automatically calculate the optimal number of cells to take. If
we know that the final copula is singular, we recommend to specify the value of to be equal
to the sample size
.
We illustrate the influence of on a singular copula.
import openturns as ot
import openturns.viewer as otv
import math as m
Routine to draw a distribution cloud and a sample.
def draw(dist, Y):
g = ot.Graph()
g.setAxes(True)
g.setGrid(True)
c = ot.Cloud(dist.getSample(10000))
c.setColor("red")
c.setPointStyle("bullet")
g.add(c)
c = ot.Cloud(Y)
c.setColor("black")
c.setPointStyle("bullet")
g.add(c)
g.setBoundingBox(
ot.Interval(
Y.getMin() - 0.5 * Y.computeRange(), Y.getMax() + 0.5 * Y.computeRange()
)
)
return g
Introduction¶
We consider the function defined by:
where:
We define the following input random vector:
with and
independent.
We define the output random vector as:
f = ot.SymbolicFunction(
["U", "v1", "v2"],
["sin(U)/(1+cos(U)^2)+0.05*v1", "sin(U)*cos(U)/(1+cos(U)^2)+0.05*v2"],
)
U = ot.Uniform(-0.85 * m.pi, 0.85 * m.pi)
V = ot.Normal(2)
X = ot.BlockIndependentDistribution([U, V])
We generate a sample of the output random vector of size
.
N = 200
sample_Y = f(X.getSample(N))
Multivariate kernel smoothing¶
We estimate the distribution of the output random vector by multivariate kernel smoothing.
y_multi_ks = ot.KernelSmoothing().build(sample_Y)
view = otv.View(draw(y_multi_ks, sample_Y))
We see that the fitting is not satisfactory: the empty regions have not been respected. The fitted copula is diffuse and does not model the observed singularity correctly.
Empirical Bernstein copula factory¶
Now, we estimate the distribution of splitting the estimation of the marginals
from the estimation of the copula:
the marginals are fitted by kernel smoothing,
the copula is fitted using the
BernsteinCopulaFactorythat builds an empirical Bernstein copula.
First, we do not specify the bin number . It is equal to the value computed by the default method, which is the
LogLikelihood criteria.
empBern_copula = ot.BernsteinCopulaFactory().buildAsEmpiricalBernsteinCopula(sample_Y)
print("bin number computed m = ", empBern_copula.getBinNumber())
marginals = [
ot.KernelSmoothing().build(sample_Y.getMarginal(j))
for j in range(sample_Y.getDimension())
]
y_empBern = ot.JointDistribution(marginals, empBern_copula)
view = otv.View(draw(y_empBern, sample_Y))
bin number computed m = 1
We see that the optimal number of cells is : it means that one single cell has been created.
The built copula is the independent copula. It is diffuse
in
. This is not satisfactory. The optimal
is not correct fot the singular copula we try to estimate.
Now, we specify a bin number equal to the sample size: so that the built copula is very close to the sample.
empBern_copula = ot.BernsteinCopulaFactory().build(sample_Y, N)
y_empBern = ot.JointDistribution(marginals, empBern_copula)
view = otv.View(draw(y_empBern, sample_Y))
otv.View.ShowAll()
# In that case, the built copula manages to reproduce the specific feature we observe in the data.
OpenTURNS