Gaussian Process Regression: Normalization for optimization

This example aims to illustrate Gaussian Process Fitter metamodel with normalization of data. Like other machine learning techniques, heteregeneous data (i.e., data defined with different orders of magnitude) can impact the training process of Gaussian Process Regression (GPR). Automatic scaling process of the input data for the optimization of GPR hyperparameters can be defined using the ResourceMap key GaussianProcessFitter-OptimizationNormalization. In this example, we show the behavior of Gaussian Process Fitter with and without activating the normalization of hyperparameters for the optimization.

Loading of the fire satellite use case

This model involves 9 input variables and 3 output variables. We select only the first output variable in this example. We load the Fire satellite use case.

import openturns as ot
from openturns.usecases.fire_satellite import FireSatelliteModel
import openturns.viewer as otv
import openturns.experimental as otexp

We define the function that evaluates the outputs depending on the inputs.

m = FireSatelliteModel()
model = m.model

We also define the distribution of input variables to build the training and test sets.

inputDistribution = m.inputDistribution

Generation of data

We now generate the input and output training sets as 20 times the dimension of the input vector.

experiment = ot.LHSExperiment(inputDistribution, 20 * m.dim)
inputTrainingSet = experiment.generate()
outputTrainingSet = model(inputTrainingSet).getMarginal(0)

print("Lower and upper bounds of inputTrainingSet:")
print(inputTrainingSet.getMin(), inputTrainingSet.getMax())
Lower and upper bounds of inputTrainingSet:
[1.54106e+07,875.065,1350.46,12.155,0.980915,0.235389,2.50699,0.895724,0.197372] [2.05626e+07,1124.35,1454.73,17.52,2.99648,0.767386,7.98289,3.05663,1.75649]

Creation of metamodel

We choose to use a constant trend.

basis = ot.LinearBasisFactory(m.dim).build()

For the purpose of illustration, we consider MaternModel.

covarianceModel = ot.MaternModel(inputTrainingSet.computeRange() * 0.1, 2.5)

Training without normalization

First, we deactivate the normalization option for the optimization.

ot.ResourceMap.SetAsBool("GaussianProcessFitter-OptimizationNormalization", False)

We run the algorithm and get the metamodel.

fitter_algo = otexp.GaussianProcessFitter(
    inputTrainingSet, outputTrainingSet, covarianceModel, basis
)
fitter_algo.run()
fitter_result = fitter_algo.getResult()
gpr_algo = otexp.GaussianProcessRegression(fitter_result)
gpr_algo.run()
gpr_result = gpr_algo.getResult()

Inspect hyperparameters

theta = gpr_result.getCovarianceModel().getParameter()
print("theta=", theta)
theta= [515199,25.7338,12.2715,3.30446,2.33972,1.02203,3.20161,0.829491,0.387141,0.00376948]#10

Validation of metamodel To validate the metamodel, we create a validation set of size equal to 50 times the input vector dimension to evaluate the functions.

gprMetamodel = gpr_result.getMetaModel()
ot.RandomGenerator.SetSeed(1)
experimentTest = ot.LHSExperiment(inputDistribution, 50 * m.dim)
inputTestSet = experimentTest.generate()
outputTestSet = model(inputTestSet).getMarginal(0)

Then, we use the MetaModelValidation class to validate the metamodel.

metamodelPredictions = gprMetamodel(inputTestSet)
val = ot.MetaModelValidation(outputTestSet, metamodelPredictions)
r2Score = val.computeR2Score()
print("R2=", r2Score)
R2= [0.6223]

Graphical validation

label = "Accuracy of metamodel without activating normalization for optimization"
graph = val.drawValidation().getGraph(0, 0)
graph.setLegends([""])
graph.setLegends(["R2 = %.2f%%" % (100 * r2Score[0]), ""])
graph.setLegendPosition("upper left")
graph.setXTitle("Exact function")
graph.setYTitle("Metamodel prediction")
graph.setTitle(label)
_ = otv.View(graph)
Accuracy of metamodel without activating normalization for optimization

Training with normalization

Then, we activate the normalization option for the optimization.

ot.ResourceMap.SetAsBool("GaussianProcessFitter-OptimizationNormalization", True)

We run the algorithm and get the metamodel.

fitter_algo = otexp.GaussianProcessFitter(
    inputTrainingSet, outputTrainingSet, covarianceModel, basis
)
fitter_algo.run()
fitter_result = fitter_algo.getResult()
gpr_algo = otexp.GaussianProcessRegression(fitter_result)
gpr_algo.run()
gpr_result = gpr_algo.getResult()

Inspect hyperparameters: we can see that parameters are much different this time

theta = gpr_result.getCovarianceModel().getParameter()
print("theta=", theta)
theta= [1.0304e+07,498.57,208.542,10.73,4.03113,1.06399,10.9518,2.55495,0.896912,0.00186723]#10

Validation of metamodel

gprMetamodel = gpr_result.getMetaModel()
metamodelPredictions = gprMetamodel(inputTestSet)
val = ot.MetaModelValidation(outputTestSet, metamodelPredictions)
r2Score = val.computeR2Score()
print("R2=", r2Score)
R2= [0.999505]

Graphical validation

label = "Accuracy of metamodel with activating normalization for optimization"
graph2 = val.drawValidation().getGraph(0, 0)
graph2.setLegends([""])
graph2.setLegends(["R2 = %.2f%%" % (100 * r2Score[0]), ""])
graph2.setLegendPosition("upper left")
graph2.setXTitle("Exact function")
graph2.setYTitle("Metamodel prediction")
graph2.setTitle(label)
_ = otv.View(graph2)
otv.View.ShowAll()
Accuracy of metamodel with activating normalization for optimization