KernelSmoothing

(Source code, png)

../../_images/KernelSmoothing.png
class KernelSmoothing(*args)

Non parametric continuous distribution estimation by kernel smoothing.

Refer to Kernel smoothing.

Parameters:
kernelDistribution, optional

Univariate distribution of the kernel that will be used. By default, the standard Normal distribution is used.

binnedbool, optional

Activates bining mechanism only in the univariate or bivariate cases. It allows one to speed up the manipulation of the density function of the resulting distribution. By default, the mechanism is activated.

binNumberint, binNumber \geq 2, optional

Indicates the number of bins used by the bining mechanism. By default, OpenTURNS uses the values stored in the ResourceMap.

boundaryCorrectionbool, optional

Activates the boundary correction using the mirroring technique. By default, the correction is not provided.

Notes

The binning mechanism creates a regular grid of binNumber intervals in each dimension, then the unit weight of each point is linearly affected to the vertices of the bin containing the point (see [wand1994] appendix D, page 182). The KernelSmoothing-BinNumber key defines the default value of the number of bins used in the _binning_ algorithm to improve the evaluation speed.

The boundary correction is available only in one dimension, and it is done using the mirroring technique. See the notes of the setBoundingOption() method for details.

When applied to multivariate samples, the kernel is the kernel product of the univariate distribution specified in the constructor.

Examples

Fit a distribution on data thanks to the kernel smoothing technique:

>>> import openturns as ot
>>> ot.RandomGenerator.SetSeed(0)
>>> sample = ot.Gamma(6.0, 1.0).getSample(100)
>>> ks = ot.KernelSmoothing()
>>> fittedDist = ks.build(sample)
>>> print(fittedDist.getClassName())
Distribution

The build() method produces a generic Distribution object. Other build methods (detailed below) produce more specific objects.

Get the bandwidth:

>>> bandwidth = ks.getBandwidth()
>>> print(bandwidth)
[0.862207]

The bandwidth was evaluated by the build() method. It could also have been provided by the user.

>>> bandwidth = [0.9]
>>> fittedDist = ks.build(sample, bandwidth)

Compare the PDFs:

>>> graph = fittedDist.drawPDF()
>>> graph.add( ot.Gamma(6.0, 1.0).drawPDF())
>>> graph.setColors(ot.Drawable.BuildDefaultPalette(2))
>>> graph.setLegends(['KS dist', 'Gamma'])

The default values of the parameters of the constructor usually provide good results. Nevertheless, the parameters can be manually set.

>>> kernel = ot.Uniform()
>>> ks = ot.KernelSmoothing(kernel)
>>> binned = True # by default True
>>> binNumber = 64
>>> ks = ot.KernelSmoothing(kernel, binned, binNumber)
>>> boundaryCorrection = True # by default False
>>> ks = ot.KernelSmoothing(kernel, binned, binNumber, boundaryCorrection)

Variants of the build() method can be used when the distribution to build is expected to be of a certain type. In those cases however, the bandwidth must be user-specified. To use buildAsTruncatedDistribution(), boundary correction must be enabled.

>>> distribution = ks.buildAsKernelMixture(sample, bandwidth)
>>> print(distribution.getClassName())
KernelMixture
>>> distribution = ks.buildAsMixture(sample, bandwidth)
>>> print(distribution.getClassName())
Mixture
>>> distribution = ks.buildAsTruncatedDistribution(sample, bandwidth)
>>> print(distribution.getClassName())
TruncatedDistribution

Methods

build(*args)

Fit a kernel smoothing distribution on data.

buildAsKernelMixture(sample, bandwidth)

Fit a kernel smoothing distribution on data.

buildAsMixture(sample, bandwidth)

Fit a kernel smoothing distribution on data.

buildAsTruncatedDistribution(sample, bandwidth)

Estimate the distribution as TruncatedDistribution.

buildEstimator(*args)

Build the distribution and the parameter distribution.

computeMixedBandwidth(sample)

Compute the bandwidth according to a mixed rule.

computePluginBandwidth(sample)

Compute the bandwidth according to the plugin rule.

computeSilvermanBandwidth(sample)

Compute the bandwidth according to the Silverman rule.

getBandwidth()

Accessor to the bandwidth used in the kernel smoothing.

getBootstrapSize()

Accessor to the bootstrap size.

getClassName()

Accessor to the object's name.

getId()

Accessor to the object's id.

getKernel()

Accessor to kernel used in the kernel smoothing.

getName()

Accessor to the object's name.

getShadowedId()

Accessor to the object's shadowed id.

getVisibility()

Accessor to the object's visibility state.

hasName()

Test if the object is named.

hasVisibleName()

Test if the object has a distinguishable name.

setAutomaticLowerBound(automaticLowerBound)

Accessor to the flag for an automatic selection of lower bound.

setAutomaticUpperBound(automaticUpperBound)

Accessor to the flag for an automatic selection of upper bound.

setBootstrapSize(bootstrapSize)

Accessor to the bootstrap size.

setBoundaryCorrection(boundaryCorrection)

Accessor to the boundary correction flag.

setBoundingOption(boundingOption)

Accessor to the boundary correction option.

setLowerBound(lowerBound)

Accessor to the lower bound for boundary correction.

setName(name)

Accessor to the object's name.

setShadowedId(id)

Accessor to the object's shadowed id.

setUpperBound(upperBound)

Accessor to the upper bound for boundary correction.

setVisibility(visible)

Accessor to the object's visibility state.

__init__(*args)
build(*args)

Fit a kernel smoothing distribution on data.

Parameters:
sample2-d sequence of float

Data on which the distribution is fitted. Any dimension.

bandwidthPoint, optional

Contains the bandwidth in each direction. If not specified, the bandwidth is calculated using the mixed rule from data.

Returns:
fittdDistDistribution

The fitted distribution.

Notes

According to the dimension of the data and the specified treatments, the resulting distribution differs.

  • If the sample is constant, a Dirac distribution is built.

  • If dimension > 2 or if no treatment has been asked for, a KernelMixture is built by calling buildAsKernelMixture.

  • If dimension = 1 and a boundary treatment has been asked for, a TruncatedDistribution is built by calling buildAsTruncatedDistribution

  • If dimension = 1 or 2 and no boundary treatment has been asked for, but a binning treatment has been asked for,

    • If the sample size is greater than the bin number, then a Mixture is built by calling buildAsMixture

    • Otherwise a KernelMixture is built by calling buildAsKernelMixture

The bandwidth selection depends on the dimension.

  • If dimension = 1, then computeMixedBandwidth is used.

  • Otherwise, then the only multivariate rule computeSilvermanBandwidth is used.

Examples

See the effect of the boundary correction:

>>> import openturns as ot
>>> sample = ot.Exponential(1.0).getSample(1000)
>>> smoother = ot.KernelSmoothing()
>>> fittedDistNoCorr = smoother.build(sample)
>>> smoother.setBoundaryCorrection(True)
>>> fittedDistWithCorr = smoother.build(sample)

Compare the PDFs:

>>> graph = ot.Exponential(1.0).drawPDF()
>>> graph.add(fittedDistNoCorr.drawPDF())
>>> graph.add(fittedDistWithCorr.drawPDF())
>>> graph.setColors(['black', 'blue', 'red'])
>>> graph.setLegends(['Exp dist', 'No boundary corr', 'Boundary corr'])
buildAsKernelMixture(sample, bandwidth)

Fit a kernel smoothing distribution on data.

Parameters:
sample2-d sequence of float

Data on which the distribution is fitted. Any dimension.

bandwidthPoint

Contains the bandwidth in each direction.

Returns:
fittdDistKernelMixture

The fitted distribution.

Notes

It builds a ~openturns.KernelMixture using the given data and bandwidth regardless of the binning or boundary treatment flags.

Examples

>>> import openturns as ot
>>> sample = ot.Exponential(1.0).getSample(1000)
>>> smoother = ot.KernelSmoothing()
>>> kernelMixture = smoother.buildAsKernelMixture(sample, [1.0])
buildAsMixture(sample, bandwidth)

Fit a kernel smoothing distribution on data.

Parameters:
sample2-d sequence of float

Data on which the distribution is fitted. Any dimension.

bandwidthPoint

Contains the bandwidth in each direction.

Returns:
fittdDistKernelMixture

The fitted distribution.

Notes

It builds a ~openturns.Mixture using the given bandwidth and a binning of the given data regardless of the bin number, the data size, the binning flag or boundary treatment flags. This method is available only for 1D or 2D samples.

Examples

>>> import openturns as ot
>>> sample = ot.Exponential(1.0).getSample(1000)
>>> smoother = ot.KernelSmoothing(ot.Normal(), True, 100, False)
>>> mixture = smoother.buildAsMixture(sample, [1.0])
buildAsTruncatedDistribution(sample, bandwidth)

Estimate the distribution as TruncatedDistribution.

Parameters:
sample2-d sequence of float

Data on which the distribution is fitted. Any dimension.

bandwidthPoint

Contains the bandwidth in each direction.

Returns:
fittdDistTruncatedDistribution

The estimated distribution as a TruncatedDistribution.

Examples

>>> import openturns as ot
>>> sample = ot.Exponential(1.0).getSample(1000)
>>> smoother = ot.KernelSmoothing(ot.Normal(), False, 0, True)
>>> truncated = smoother.buildAsTruncatedDistribution(sample, [1.0])
buildEstimator(*args)

Build the distribution and the parameter distribution.

Parameters:
sample2-d sequence of float

Data.

parametersDistributionParameters

Optional, the parametrization.

Returns:
resDistDistributionFactoryResult

The results.

Notes

According to the way the native parameters of the distribution are estimated, the parameters distribution differs:

  • Moments method: the asymptotic parameters distribution is normal and estimated by Bootstrap on the initial data;

  • Maximum likelihood method with a regular model: the asymptotic parameters distribution is normal and its covariance matrix is the inverse Fisher information matrix;

  • Other methods: the asymptotic parameters distribution is estimated by Bootstrap on the initial data and kernel fitting (see KernelSmoothing).

If another set of parameters is specified, the native parameters distribution is first estimated and the new distribution is determined from it:

  • if the native parameters distribution is normal and the transformation regular at the estimated parameters values: the asymptotic parameters distribution is normal and its covariance matrix determined from the inverse Fisher information matrix of the native parameters and the transformation;

  • in the other cases, the asymptotic parameters distribution is estimated by Bootstrap on the initial data and kernel fitting.

computeMixedBandwidth(sample)

Compute the bandwidth according to a mixed rule.

Returns:
bandwidthPoint

Bandwidth which components are evaluated according to a mixed rule.

Notes

This method uses the mixed rule introduced in Kernel smoothing. Its goal is to provide an accurate estimator of the bandwidth when the sample size is large.

Let n be the sample size. The estimator depends on the threshold sample size n_t defined in the KernelSmoothing-SmallSize key of the ResourceMap.

  • If n \leq n_t, i.e. for a small sample, we use the plugin solve-the-equation method.

  • Otherwise, the mixed rule is used.

computePluginBandwidth(sample)

Compute the bandwidth according to the plugin rule.

Returns:
bandwidthPoint

Bandwidth which components are evaluated according to the plugin rule.

Notes

This plug-in method is based on the solve-the-equation rule from [sheather1991]. This method can take a lot of time for large samples, as the cost is quadratic with the sample size.

Several keys of the ResourceMap are used by the [sheather1991] method.

  • The key KernelSmoothing-AbsolutePrecision is used in the Sheather-Jones algorithm to estimate the bandwidth. It defines the absolute tolerance used by the solver to solve the nonlinear equation.

  • The KernelSmoothing-MaximumIteration key defines the maximum number of iterations used by the solver.

  • The KernelSmoothing-RelativePrecision key defines the relative tolerance.

  • The KernelSmoothing-AbsolutePrecision key defines the absolute tolerance.

  • The KernelSmoothing-ResidualPrecision key defines the absolute tolerance on the residual.

  • The KernelSmoothing-CutOffPlugin key is the cut-off value introduced in Kernel smoothing.

More precisely, the KernelSmoothing-CutOffPlugin key of the ResourceMap controls the accuracy of the approximation used to estimate the rugosity of the second derivative of the distribution. The default value ensures that terms in the sum which weight are lower than 4 \times 10^{-6} are ignored, which can reduce the calculation in some situations. The properties of the standard gaussian density are so that, in order to make the computation exact, the value of the KernelSmoothing-CutOffPlugin must be set to 39, but this may increase the computation time.

computeSilvermanBandwidth(sample)

Compute the bandwidth according to the Silverman rule.

Returns:
bandwidthPoint

Bandwidth which components are evaluated according to the Silverman rule assuming a normal distribution. The bandwidth uses a robust estimate of the sample standard deviation, based on the interquartile range introduced in Kernel smoothing (rather than the sample standard deviation). This method can manage a multivariate sample and produces a multivariate bandwidth.

getBandwidth()

Accessor to the bandwidth used in the kernel smoothing.

Returns:
bandwidthPoint

Bandwidth used in each direction.

getBootstrapSize()

Accessor to the bootstrap size.

Returns:
sizeinteger

Size of the bootstrap.

getClassName()

Accessor to the object’s name.

Returns:
class_namestr

The object class name (object.__class__.__name__).

getId()

Accessor to the object’s id.

Returns:
idint

Internal unique identifier.

getKernel()

Accessor to kernel used in the kernel smoothing.

Returns:
kernelDistribution

Univariate distribution used to build the kernel.

getName()

Accessor to the object’s name.

Returns:
namestr

The name of the object.

getShadowedId()

Accessor to the object’s shadowed id.

Returns:
idint

Internal unique identifier.

getVisibility()

Accessor to the object’s visibility state.

Returns:
visiblebool

Visibility flag.

hasName()

Test if the object is named.

Returns:
hasNamebool

True if the name is not empty.

hasVisibleName()

Test if the object has a distinguishable name.

Returns:
hasVisibleNamebool

True if the name is not empty and not the default one.

setAutomaticLowerBound(automaticLowerBound)

Accessor to the flag for an automatic selection of lower bound.

Parameters:
automaticLowerBoundbool

Flag to tell if the user-defined lower bound has to be taken into account (value False) or if the minimum of the given sample has to be used (value True).

setAutomaticUpperBound(automaticUpperBound)

Accessor to the flag for an automatic selection of upper bound.

Parameters:
automaticUpperBoundbool

Flag to tell if the user-defined upper bound has to be taken into account (value False) or if the maximum of the given sample has to be used (value True).

setBootstrapSize(bootstrapSize)

Accessor to the bootstrap size.

Parameters:
sizeinteger

The size of the bootstrap.

setBoundaryCorrection(boundaryCorrection)

Accessor to the boundary correction flag.

Parameters:
boundaryCorrectionbool

Activates the boundary correction using the mirroring technique.

setBoundingOption(boundingOption)

Accessor to the boundary correction option.

Parameters:
boundingOptionint

Select the boundary correction option, see notes.

Notes

The possible values for the bounding option are:

  • KernelSmoothing.NONE or 0: no boundary correction

  • KernelSmoothing.LOWER or 1: apply the boundary correction to the lower bound

  • KernelSmoothing.UPPER or 2: apply the boundary correction to the upper bound

  • KernelSmoothing.BOTH or 3: apply the boundary correction to both bounds

It applies only to 1D samples. Each bound can be defined by the user or computed automatically from the sample, see setLowerBound, setUpperBound, setAutomaticLowerBound, setAutomaticUpperBound.

setLowerBound(lowerBound)

Accessor to the lower bound for boundary correction.

Parameters:
lowerBoundfloat

A user-defined lower bound to take into account for boundary correction.

Notes

This method automatically sets the automaticLowerBound flag to False. The given value will be taken into account only if boundingOption is set to either 1 or 3. If the algorithm is applied to a sample with a minimum value less than the user-defined lower bound and the automaticLowerBound is set to False, then an exception it raised.

setName(name)

Accessor to the object’s name.

Parameters:
namestr

The name of the object.

setShadowedId(id)

Accessor to the object’s shadowed id.

Parameters:
idint

Internal unique identifier.

setUpperBound(upperBound)

Accessor to the upper bound for boundary correction.

Parameters:
upperBoundfloat

A user-defined lower bound to take into account for boundary correction.

Notes

This method automatically sets the automaticLowerBound flag to False. The given value will be taken into account only if boundingOption is set to either 1 or 3. If the algorithm is applied to a sample with a minimum value less than the user-defined lower bound and the automaticLowerBound is set to False, then an exception it raised.

setVisibility(visible)

Accessor to the object’s visibility state.

Parameters:
visiblebool

Visibility flag.

Examples using the class

Model a singular multivariate distribution

Model a singular multivariate distribution

Bandwidth sensitivity in kernel smoothing

Bandwidth sensitivity in kernel smoothing

Estimate a conditional quantile

Estimate a conditional quantile

Fit a non parametric distribution

Fit a non parametric distribution

Kolmogorov-Smirnov : get the statistics distribution

Kolmogorov-Smirnov : get the statistics distribution

Fit a non parametric copula

Fit a non parametric copula

Truncate a distribution

Truncate a distribution

Analyse the central tendency of a cantilever beam

Analyse the central tendency of a cantilever beam

Gibbs sampling of the posterior distribution

Gibbs sampling of the posterior distribution

Sampling from an unnormalized probability density

Sampling from an unnormalized probability density

Posterior sampling using a PythonDistribution

Posterior sampling using a PythonDistribution

Bayesian calibration of a computer code

Bayesian calibration of a computer code

Bayesian calibration of the flooding model

Bayesian calibration of the flooding model

Linear Regression with interval-censored observations

Linear Regression with interval-censored observations