# Estimate a multivariate distribution¶

In this example we are going to estimate a joint distribution from a multivariate sample by fitting marginals and finding a set of copulas.

While the estimation of marginals is quite straightforward, the estimation of the dependency structure takes several steps:

• find the dependent components

• estimate a copula on each dependent bloc

• assemble the estimated copulas

:

from __future__ import print_function
import openturns as ot
import math as m

:

# generate some multivariate data to estimate, with correlation
cop1 = ot.AliMikhailHaqCopula(0.6)
cop2 = ot.ClaytonCopula(2.5)
copula = ot.ComposedCopula([cop1, cop2])
marginals = [ot.Uniform(5.0, 6.0), ot.Arcsine(), ot.Normal(-40.0, 3.0), ot.Triangular(100.0, 150.0, 300.0)]
distribution = ot.ComposedDistribution(marginals, copula)
sample = distribution.getSample(10000).getMarginal([0, 2, 3, 1])

:

# estimate marginals
dimension = sample.getDimension()
marginalFactories = []
for factory in ot.DistributionFactory.GetContinuousUniVariateFactories():
if str(factory).startswith('Histogram'):
# ~ non-parametric
continue
marginalFactories.append(factory)
estimated_marginals = [ot.FittingTest.BestModelBIC(sample.getMarginal(i), marginalFactories) for i in range(dimension)]
estimated_marginals

:

[class=Uniform name=Uniform dimension=1 a=5.00008 b=6,
class=Normal name=Normal dimension=1 mean=class=Point name=Unnamed dimension=1 values=[-39.9413] sigma=class=Point name=Unnamed dimension=1 values=[3.04363] correlationMatrix=class=CorrelationMatrix dimension=1 implementation=class=MatrixImplementation name=Unnamed rows=1 columns=1 values=,
class=Trapezoidal name=Trapezoidal dimension=1 a=99.9689 b=148.567 c=154.696 d=299.898 h=0.009706,
class=Beta name=Beta dimension=1 alpha=0.49567 beta=0.497672 a=-1.0002 b=1.0002]


Find connected components of a graph defined from its adjacency matrix

:

def find_neighbours(head, covariance, to_visit, visited):
N = covariance.getDimension()
for i in to_visit:
# If i is connected to head and has not yet been visited
# Add i to the current component
component = find_neighbours(i, covariance, to_visit, visited)
current_component += component
return current_component

def connected_components(covariance):
N = covariance.getDimension()
to_visit = list(range(N))
visited =  * N
all_components = []
component = find_neighbours(head, covariance, to_visit, visited)
all_components.append(sorted(component))
return all_components


## Estimate the copula¶

First find the dependent components : we compute the Spearman correlation

:

C = sample.computeSpearmanCorrelation()
print(C)

[[  1           0.00263067  0.00518309  0.23953    ]
[  0.00263067  1           0.736523   -0.00576173 ]
[  0.00518309  0.736523    1          -0.0083771  ]
[  0.23953    -0.00576173 -0.0083771   1          ]]


We filter and consider only significantly non-zero correlations.

:

epsilon = 1.0 / m.sqrt(sample.getSize())
for j in range(dimension):
for i in range(j):
C[i, j] = 1.0 if abs(C[i, j]) > epsilon else 0.0
print(C)

[[ 1 0 0 1 ]
[ 0 1 1 0 ]
[ 0 1 1 0 ]
[ 1 0 0 1 ]]


Note that we can apply the HypothesisTest.Spearman test. As the null hypothesis of the test is the independence, we must take the complementary of the binary measure as follow:

M = ot.SymmetricMatrix(dimension)
for i in range(dimension):
M[i,i] = 1
for j in range(i):
M[i, j] = 1 - ot.HypothesisTest.Spearman(sample[:,i], sample[:,j]).getBinaryQualityMeasure()


Now we find the independent blocs:

:

blocs = connected_components(C)
blocs

:

[[0, 3], [1, 2]]


For each dependent block, we estimate the most accurate non parameteric copula

:

copulaFactories = []
for factory in ot.DistributionFactory.GetContinuousMultiVariateFactories():
if not factory.build().isCopula():
continue
if factory.getImplementation().getClassName()=='BernsteinCopulaFactory':
continue
copulaFactories.append(factory)
estimated_copulas = [ot.FittingTest.BestModelBIC(sample.getMarginal(bloc), copulaFactories) for bloc in blocs]
estimated_copulas

:

[class=AliMikhailHaqCopula name=AliMikhailHaqCopula dimension=2 theta=0.601685,
class=ClaytonCopula name=ClaytonCopula dimension=2 theta=2.45478]


Finally we assemble the copula

:

estimated_copula_perm = ot.ComposedCopula(estimated_copulas)


Take care of the order of each bloc vs the order of original components !

:

permutation = []
for bloc in blocs:
permutation.extend(bloc)
inverse_permutation = [-1] * dimension
for i in range(dimension):
inverse_permutation[permutation[i]] = i
estimated_copula = estimated_copula_perm.getMarginal(inverse_permutation)
estimated_copula

:


MarginalDistribution(distribution=ComposedCopula(AliMikhailHaqCopula(theta = 0.601685), ClaytonCopula(theta = 2.45478)), indices=[0,2,3,1])

We build joint distribution from marginal distributions and dependency structure:

:

estimated_distribution = ot.ComposedDistribution(estimated_marginals, estimated_copula)
estimated_distribution

:


ComposedDistribution(Uniform(a = 5.00008, b = 6), Normal(mu = -39.9413, sigma = 3.04363), Trapezoidal(a = 99.9689, b = 148.567, c = 154.696, d = 299.898), Beta(alpha = 0.49567, beta = 0.497672, a = -1.0002, b = 1.0002), MarginalDistribution(distribution=ComposedCopula(AliMikhailHaqCopula(theta = 0.601685), ClaytonCopula(theta = 2.45478)), indices=[0,2,3,1]))