HistogramFactory

(Source code, png)

../../_images/openturns-HistogramFactory-1.png
class HistogramFactory(*args)

Histogram factory.

Methods

build(*args)

Build the distribution.

buildAsHistogram(*args)

Estimate the distribution as native distribution.

buildEstimator(*args)

Build the distribution and the parameter distribution.

buildFromQuantiles(lowerBound, ...)

Build from quantiles.

computeBandwidth(sample[, useQuantile])

Compute the bandwidth.

getBootstrapSize()

Accessor to the bootstrap size.

getClassName()

Accessor to the object's name.

getKnownParameterIndices()

Accessor to the known parameters indices.

getKnownParameterValues()

Accessor to the known parameters values.

getName()

Accessor to the object's name.

hasName()

Test if the object is named.

setBootstrapSize(bootstrapSize)

Accessor to the bootstrap size.

setKnownParameter(values, positions)

Accessor to the known parameters.

setName(name)

Accessor to the object's name.

Notes

The range is [\min(data), \max(data)].

See the computeBandwidth() method for the bandwidth selection.

Examples

Create an histogram:

>>> import openturns as ot
>>> sample = ot.Normal().getSample(50)
>>> histogram = ot.HistogramFactory().build(sample)

Create an histogram from a number of bins:

>>> import openturns as ot
>>> sample = ot.Normal().getSample(50)
>>> binNumber = 10
>>> histogram = ot.HistogramFactory().build(sample, binNumber)

Create an histogram from a bandwidth:

>>> import openturns as ot
>>> sample = ot.Normal().getSample(50)
>>> bandwidth = 0.5
>>> histogram = ot.HistogramFactory().build(sample, bandwidth)

Create an histogram from a first value and widths:

>>> import openturns as ot
>>> ot.RandomGenerator.SetSeed(0)
>>> sample = ot.Normal().getSample(50)
>>> first = -4
>>> width = ot.Point(7, 1.)
>>> histogram = ot.HistogramFactory().build(sample, first, width)

Compute bandwidth with default robust estimator:

>>> import openturns as ot
>>> ot.RandomGenerator.SetSeed(0)
>>> sample = ot.Normal().getSample(50)
>>> factory = ot.HistogramFactory()
>>> factory.computeBandwidth(sample)
0.8207...

Compute bandwidth with optimal estimator:

>>> import openturns as ot
>>> ot.RandomGenerator.SetSeed(0)
>>> sample = ot.Normal().getSample(50)
>>> factory = ot.HistogramFactory()
>>> factory.computeBandwidth(sample, False)
0.9175...
__init__(*args)
build(*args)

Build the distribution.

Available usages:

build()

build(sample)

build(param)

Parameters:
sample2-d sequence of float

Data.

paramsequence of float

The parameters of the distribution.

Returns:
distDistribution

The estimated distribution.

In the first usage, the default native distribution is built.

buildAsHistogram(*args)

Estimate the distribution as native distribution.

If the sample is constant, the range of the histogram would be zero. In this case, the range is set to be a factor of the Distribution-DefaultCDFEpsilon key of the ResourceMap.

Available usages:

buildAsHistogram()

buildAsHistogram(sample)

buildAsHistogram(sample, binNumber)

buildAsHistogram(sample, bandwidth)

buildAsHistogram(sample, first, width)

Parameters:
sample2-d sequence of float

Data.

binNumberint

The number of classes.

bandwidthfloat

The width of each class.

firstfloat

The lower bound of the first class.

width1-d sequence of float

The widths of the classes.

Returns:
distributionHistogram

The estimated distribution as a Histogram.

In the first usage, the default Histogram distribution is built.

buildEstimator(*args)

Build the distribution and the parameter distribution.

Parameters:
sample2-d sequence of float

Data.

parametersDistributionParameters

Optional, the parametrization.

Returns:
resDistDistributionFactoryResult

The results.

Notes

According to the way the native parameters of the distribution are estimated, the parameters distribution differs:

  • Moments method: the asymptotic parameters distribution is normal and estimated by Bootstrap on the initial data;

  • Maximum likelihood method with a regular model: the asymptotic parameters distribution is normal and its covariance matrix is the inverse Fisher information matrix;

  • Other methods: the asymptotic parameters distribution is estimated by Bootstrap on the initial data and kernel fitting (see KernelSmoothing).

If another set of parameters is specified, the native parameters distribution is first estimated and the new distribution is determined from it:

  • if the native parameters distribution is normal and the transformation regular at the estimated parameters values: the asymptotic parameters distribution is normal and its covariance matrix determined from the inverse Fisher information matrix of the native parameters and the transformation;

  • in the other cases, the asymptotic parameters distribution is estimated by Bootstrap on the initial data and kernel fitting.

buildFromQuantiles(lowerBound, probabilities, quantiles)

Build from quantiles.

We consider an histogram distribution with K bins. Given a set of probabilities p_1, ..., p_K \in [0, 1] and a set of quantiles q_1, ..., q_K \in \Rset, we compute the parameters of the distribution such that:

\Prob{X \leq q_i} = p_i

for i = 1, ..., K.

Parameters:
lowerBoundfloat

Lower bound.

probabilitiessequence of float

The probabilities.

quantilessequence of float

Quantiles of the desired distribution.

Returns:
distHistogram

Estimated distribution.

Examples

>>> import openturns as ot
>>> ref_dist = ot.Normal()
>>> lowerBound = -3.0
>>> N = 10
>>> probabilities = [(i+1) / N for i in range(N)]
>>> quantiles = [ref_dist.computeQuantile(pi)[0] for pi in probabilities]
>>> factory = ot.HistogramFactory()
>>> dist = factory.buildFromQuantiles(lowerBound, probabilities, quantiles)
computeBandwidth(sample, useQuantile=True)

Compute the bandwidth.

Parameters:
sampleSample

Data

Returns:
bandwidthfloat

The estimated bandwidth.

useQuantilebool, optional (default=`True`)

If True, then use the robust bandwidth estimator based on Freedman and Diaconis rule. Otherwise, use the optimal bandwidth estimator based on Scott’s rule.

Notes

The bandwidth of the histogram is based on the asymptotic mean integrated squared error (AMISE).

When useQuantile is True (the default), the bandwidth is based on the quantiles of the sample. For any \alpha\in(0,1], let q_n(\alpha) be the empirical quantile at level \alpha of the sample. Let Q_1 and Q_3 be the first and last quartiles of the sample:

Q_3 = q_n(0.75), \qquad Q_1 = q_n(0.25),

and let IQR be the inter-quartiles range:

IQR = Q_3 - Q_1.

In this case, the bandwidth is the robust estimator of the AMISE-optimal bandwidth, known as Freedman and Diaconis rule [freedman1981]:

h = \frac{IQR}{2\Phi^{-1}(0.75)} \left(\frac{24 \sqrt{\pi}}{n}\right)^{\frac{1}{3}}

where \Phi^{-1} is the quantile function of the gaussian standard distribution. The expression \frac{IQR}{2\Phi^{-1}(0.75)} is the normalized inter-quartile range and is equal to the standard deviation of the gaussian distribution. The normalized inter-quartile range is a robust estimator of the scale of the distribution (see [wand1994], page 60).

When useQuantile is False, the bandwidth is the AMISE-optimal one, known as Scott’s rule (see [scott2015] eq. 3.16 page 59):

h = \sigma_n \left(\frac{24 \sqrt{\pi}}{n}\right)^{\frac{1}{3}}

where \sigma_n^2 is the unbiased variance of the data. This estimator is optimal for the gaussian distribution (see [scott1992]). In this case, the AMISE is O(n^{-2/3}).

If the bandwidth is computed as zero (for example, if the sample is constant), then the Distribution-DefaultQuantileEpsilon key of the ResourceMap is used instead.

getBootstrapSize()

Accessor to the bootstrap size.

Returns:
sizeint

Size of the bootstrap.

getClassName()

Accessor to the object’s name.

Returns:
class_namestr

The object class name (object.__class__.__name__).

getKnownParameterIndices()

Accessor to the known parameters indices.

Returns:
indicesIndices

Indices of the known parameters.

getKnownParameterValues()

Accessor to the known parameters values.

Returns:
valuesPoint

Values of known parameters.

getName()

Accessor to the object’s name.

Returns:
namestr

The name of the object.

hasName()

Test if the object is named.

Returns:
hasNamebool

True if the name is not empty.

setBootstrapSize(bootstrapSize)

Accessor to the bootstrap size.

Parameters:
sizeint

The size of the bootstrap.

setKnownParameter(values, positions)

Accessor to the known parameters.

Parameters:
valuessequence of float

Values of known parameters.

positionssequence of int

Indices of known parameters.

Examples

When a subset of the parameter vector is known, the other parameters only have to be estimated from data.

In the following example, we consider a sample and want to fit a Beta distribution. We assume that the a and b parameters are known beforehand. In this case, we set the third parameter (at index 2) to -1 and the fourth parameter (at index 3) to 1.

>>> import openturns as ot
>>> ot.RandomGenerator.SetSeed(0)
>>> distribution = ot.Beta(2.3, 2.2, -1.0, 1.0)
>>> sample = distribution.getSample(10)
>>> factory = ot.BetaFactory()
>>> # set (a,b) out of (r, t, a, b)
>>> factory.setKnownParameter([-1.0, 1.0], [2, 3])
>>> inf_distribution = factory.build(sample)
setName(name)

Accessor to the object’s name.

Parameters:
namestr

The name of the object.

Examples using the class

Draw an histogram

Draw an histogram

Compare unconditional and conditional histograms

Compare unconditional and conditional histograms

Define a distribution from quantiles

Define a distribution from quantiles

Fit an extreme value distribution

Fit an extreme value distribution

Kolmogorov-Smirnov : get the statistics distribution

Kolmogorov-Smirnov : get the statistics distribution

Generate random variates by inverting the CDF

Generate random variates by inverting the CDF

Quick start guide to distributions

Quick start guide to distributions

Create a mixture of distributions

Create a mixture of distributions

Aggregate processes

Aggregate processes

Use the Box-Cox transformation

Use the Box-Cox transformation

Create a polynomial chaos for the Ishigami function: a quick start guide to polynomial chaos

Create a polynomial chaos for the Ishigami function: a quick start guide to polynomial chaos

Kriging : cantilever beam model

Kriging : cantilever beam model

Gaussian Process Regression : cantilever beam model

Gaussian Process Regression : cantilever beam model

Estimate a flooding probability

Estimate a flooding probability

Estimate a probability with Monte-Carlo on axial stressed beam: a quick start guide to reliability

Estimate a probability with Monte-Carlo on axial stressed beam: a quick start guide to reliability

Estimate a buckling probability

Estimate a buckling probability

Estimate Sobol’ indices for the Ishigami function by a sampling method: a quick start guide to sensitivity analysis

Estimate Sobol' indices for the Ishigami function by a sampling method: a quick start guide to sensitivity analysis

Bayesian calibration of hierarchical fission gas release models

Bayesian calibration of hierarchical fission gas release models

A quick start guide to graphs

A quick start guide to graphs