.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_data_analysis/distribution_fitting/plot_estimate_gpd_wooster.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_data_analysis_distribution_fitting_plot_estimate_gpd_wooster.py: Estimate a GPD on the Wooster temperature data ============================================== .. GENERATED FROM PYTHON SOURCE LINES 6-9 In this example, we illustrate various techniques of extreme value modeling applied to the 5-year series of daily minimum temperatures recorded in Wooster, Ohio between 1983 and 1988. Readers should refer to [coles2001]_ example 1.7 to get more details. .. GENERATED FROM PYTHON SOURCE LINES 9-15 .. code-block:: Python import openturns as ot import openturns.experimental as otexp import openturns.viewer as otv from openturns.usecases import coles import pandas as pd .. GENERATED FROM PYTHON SOURCE LINES 16-23 First, we load the Wooster dataset. As we want to model very low temperatures, we first negate the values to transform minimum to maximum. Hence, large positive observations correspond to extreme cold conditions. We look at the negated data through time. The data are plotted as degrees Fahrenheit below zero. We see that there is a strong annual cycle in the data. We use the pandas library. .. GENERATED FROM PYTHON SOURCE LINES 23-37 .. code-block:: Python full = pd.read_csv(coles.Coles().wooster, index_col=0, parse_dates=True) full["Temperature"] = full["Temperature"].apply(lambda x: x * -1.0) print(full[:10]) title = "Negated Wooster daily minimum temperatures" graph = ot.Graph(title, "Day index", "Temperature (°F)", True, "") days = ot.Sample([[i] for i in range(len(full))]) sample = ot.Sample.BuildFromDataFrame(full) cloud = ot.Cloud(days, sample) cloud.setColor("red") cloud.setPointStyle(",") graph.add(cloud) graph.setIntegerXTick(True) view = otv.View(graph) .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_001.png :alt: Negated Wooster daily minimum temperatures :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Temperature Date 1983-01-01 -23.0 1983-01-02 -29.0 1983-01-03 -19.0 1983-01-04 -14.0 1983-01-05 -27.0 1983-01-06 -32.0 1983-01-07 -31.0 1983-01-08 -26.0 1983-01-09 -21.0 1983-01-10 -41.0 .. GENERATED FROM PYTHON SOURCE LINES 38-42 In order to decrease the time-varying trend, we partition the data into the four seasons. To perform that data stratification, we use the pandas library once again. The graphs show that there is still non-stationarity within each season's data but to a lesser extent than in the original series. .. GENERATED FROM PYTHON SOURCE LINES 42-58 .. code-block:: Python full_season = {} full_season["winter"] = full[(full.index.month >= 12) | (full.index.month < 3)] full_season["spring"] = full[(full.index.month >= 3) & (full.index.month < 6)] full_season["summer"] = full[(full.index.month >= 6) & (full.index.month < 9)] full_season["autumn"] = full[(full.index.month >= 9) & (full.index.month < 12)] for season in full_season: df = full_season[season] days = ot.Sample([[i] for i in range(len(df))]) sample = ot.Sample.BuildFromDataFrame(df) cloud = ot.Cloud(days, sample) cloud.setColor("red") cloud.setPointStyle(",") graph.setDrawable(cloud, 0) graph.setTitle(f"{title} ({season})") view = otv.View(graph) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_002.png :alt: Negated Wooster daily minimum temperatures (winter) :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_002.png :class: sphx-glr-multi-img * .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_003.png :alt: Negated Wooster daily minimum temperatures (spring) :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_003.png :class: sphx-glr-multi-img * .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_004.png :alt: Negated Wooster daily minimum temperatures (summer) :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_004.png :class: sphx-glr-multi-img * .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_005.png :alt: Negated Wooster daily minimum temperatures (autumn) :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_005.png :class: sphx-glr-multi-img .. GENERATED FROM PYTHON SOURCE LINES 59-76 Here, we illustrate that threshold exceedances occur in groups: one extremely cold day is likely to be followed by another, implying that observations exceeding a high threshold are dependent. Hence we use the declustering method which filters the dependent observations exceeding a given threshold to obtain a set of threshold excesses that can be assumed as independent. First, we specify a threshold :math:`u`. Consecutive exceedances of the threshold belong to the same cluster. Two distinct clusters are separated by :math:`r` consecutive observations under the threshold. Within each cluster, we select the maximum value that will be used to infer the GPD distribution. The cluster maxima are assumed to be independent. On the graph, we show the clusters associated to the threshold :math:`u=0` and the respective maximum selected from a 100-day portion of the Wooster daily minimum temperature series. It is possible to extract the data belonging to the same cluster and the cluster maximum series. .. GENERATED FROM PYTHON SOURCE LINES 76-84 .. code-block:: Python first100 = ot.Sample.BuildFromDataFrame(full[350:450]) part = otexp.SamplePartition(first100) u = 0.0 r = 4 peaks, clusters = part.getPeakOverThreshold(u, r) graph = clusters.draw(u) view = otv.View(graph) .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_006.png :alt: Temperature clusters :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_006.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 85-103 Here, we illustrate the effect of different choices for :math:`u` and :math:`r` on the estimate of the GPD distriution. We focus on the winter season. We perform the following steps, for each :math:`(u, r)`: - we extract the clusters and the associated peaks, - we fit a GPD distribution on the excesses by the maximum likelihood method, - we estimate the :math:`95\%` confidence interval of each parameter, - we evaluate the :math:`T=100`-year return level which corresponds to the :math:`m`-observation return level, where :math:`m = T*n_y` with :math:`n_y` the number of observations per year. Here, we consider the winter season which is about 90 days long. To calculate it, we evaluate the extremal index :math:`\theta` which is the inverse of the mean length of the clusters, estimated by the ratio between the number of clusters and the number of exceedances of :math:`u`. The stability in the return level estimations confirms that the inference is robust despite the subjective choices that need to be made on :math:`(u, r)`. .. GENERATED FROM PYTHON SOURCE LINES 103-145 .. code-block:: Python winter = full_season["winter"] winter_sample = ot.Sample.BuildFromDataFrame(winter) # partition the aggregated winter sample according to indices (enforces separation of seasons from different years) part = otexp.SamplePartition.ExtractFromDataFrame(full, winter) factory = ot.GeneralizedParetoFactory() for u in [-10.0, -20.0]: for r in [1, 3]: peaks, clusters = part.getPeakOverThreshold(u, r) nc = len(peaks) nu = (winter > u).values.sum() # fit a stationary gpd on the clusters result_LL = factory.buildMethodOfLikelihoodMaximizationEstimator(peaks, u) sigma, xi, _ = result_LL.getParameterDistribution().getMean() ( sigma_stddev, xi_stddev, _, ) = result_LL.getParameterDistribution().getStandardDeviation() print( f"u={u} r={r} nc={nc} sigma={sigma:.2f} ({sigma_stddev:.2f}) xi={xi:.2f} ({xi_stddev:.2f})", end=" ", ) # estimate the T-year return level ny = 90 T = 100 theta = nc / nu xm_100 = factory.buildReturnLevelEstimator( result_LL, winter_sample, T * ny, theta ) print(f"x100={xm_100.getMean()} ({xm_100.getStandardDeviation()})") # plot the return level validation = otexp.GeneralizedParetoValidation(result_LL, peaks) grid = validation.drawDiagnosticPlot() rlPlot = grid.getGraph(1, 0) rlPlot.setTitle(rlPlot.getTitle() + f" (u={u} r={r})") view = otv.View(rlPlot) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_007.png :alt: Return level plot (u=-10.0 r=1) :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_007.png :class: sphx-glr-multi-img * .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_008.png :alt: Return level plot (u=-10.0 r=3) :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_008.png :class: sphx-glr-multi-img * .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_009.png :alt: Return level plot (u=-20.0 r=1) :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_009.png :class: sphx-glr-multi-img * .. image-sg:: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_010.png :alt: Return level plot (u=-20.0 r=3) :srcset: /auto_data_analysis/distribution_fitting/images/sphx_glr_plot_estimate_gpd_wooster_010.png :class: sphx-glr-multi-img .. rst-class:: sphx-glr-script-out .. code-block:: none u=-10.0 r=1 nc=30 sigma=11.26 (2.74) xi=-0.26 (0.18) x100=[25.19] ([9.06588]) u=-10.0 r=3 nc=19 sigma=13.08 (4.36) xi=-0.31 (0.28) x100=[25.3785] ([11.3733]) u=-20.0 r=1 nc=43 sigma=14.51 (2.94) xi=-0.24 (0.16) x100=[28.3066] ([11.8519]) u=-20.0 r=3 nc=29 sigma=15.30 (3.67) xi=-0.26 (0.19) x100=[27.8081] ([12.826]) .. GENERATED FROM PYTHON SOURCE LINES 146-147 .. code-block:: Python otv.View.ShowAll() .. _sphx_glr_download_auto_data_analysis_distribution_fitting_plot_estimate_gpd_wooster.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_estimate_gpd_wooster.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_estimate_gpd_wooster.py `