From e22976a17813301a82adfbac6c9099b2a37c9773 Mon Sep 17 00:00:00 2001 From: Jonathan Berrisch Date: Sat, 14 Jun 2025 10:07:02 +0200 Subject: [PATCH] Add index.html to version control --- .gitignore | 1 - index.html | 3513 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 3513 insertions(+), 1 deletion(-) create mode 100644 index.html diff --git a/.gitignore b/.gitignore index 6f7f8bc..1923afb 100644 --- a/.gitignore +++ b/.gitignore @@ -83,6 +83,5 @@ data/* # Ignore html files for now # TODO: Remove later -index.html index_cache index_files diff --git a/index.html b/index.html new file mode 100644 index 0000000..8721b74 --- /dev/null +++ b/index.html @@ -0,0 +1,3513 @@ + + + + + + + + + + + + + + + + De-Fence + + + + + + + + + + + + + + + + + + + +
+ + Uni Duisburg-Essen Logo + + + HECF Logo + +
+ + +
+
+ +
+

Data Science Methods for Forecasting in Energy and Economics

+ +
+
+
+Jonathan Berrisch +
+

+ University of Duisburg-Essen, House of Energy, Climate, and Finance +

+
+
+ +

2025-06-30

+
+
+

Outline

+ + +
+

Research Motivation

+

Overview of the Thesis

+

Online Aggregation

+

Probabilistic Forecasting of European Carbon and Energy Prices

+

Contributions

+
+
+
+

The beginning: June 2020

+ +

Artwork by @allison_horst

+
+

Motivation

+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + +
+ + +Energy market liberalization created complex, interconnected trading systems +
+ + +Renewable energy transition introduces uncertainty and volatility from weather-dependent generation +
+ + +Traditional point forecasts are inadequate for modern energy markets with increasing uncertainty +
+ + +Risk inherently is a probabilistic concept +
+ + +Probabilistic forecasting essential for risk management, planning and decision making in volatile energy environments +
+ + +Online learning methods needed for fast-updating models with streaming energy data +
+
+ +
+
+
+

+
Image generated by Sora (OpenAI)
+
+
+
+
+
+

Overview of the Thesis

+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + +Berrisch, J., & Ziel, F. (2023). CRPS learning. Journal of Econometrics, 237(2), 105221. +
+ + +Berrisch, J., & Ziel, F. (2024). Multivariate probabilistic CRPS learning with an application to day-ahead electricity prices. International Journal of Forecasting, 40(4), 1568–1586. +
+ + +Hirsch, S., Berrisch, J., & Ziel, F. (2024). Online Distributional Regression. arXiv preprint arXiv:2407.08750. +
+ + +Berrisch, J., & Ziel, F. (2022). Distributional modeling and forecasting of natural gas prices. Journal of Forecasting, 41(6), 1065–1086. +
+ + +Berrisch, J., Pappert, S., Ziel, F., & Arsova, A. (2023). Modeling volatility and dependence of European carbon and energy prices. Finance Research Letters, 52, 103503. +
+ + +Berrisch, J., Narajewski, M., & Ziel, F. (2023). High-resolution peak demand estimation using generalized additive models and deep neural networks. Energy and AI, 13, 100236. +
+ + +Berrisch, J. (2025). rcpptimer: Rcpp Tic-Toc Timer with OpenMP Support. arXiv preprint arXiv:2501.15856. +
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + +Berrisch, J., & Ziel, F. (2023). CRPS learning. Journal of Econometrics, 237(2), 105221. +
+ + +Berrisch, J., & Ziel, F. (2024). Multivariate probabilistic CRPS learning with an application to day-ahead electricity prices. International Journal of Forecasting, 40(4), 1568–1586. +
+ + +Hirsch, S., Berrisch, J., & Ziel, F. (2024). Online Distributional Regression. arXiv preprint arXiv:2407.08750. +
+ + +Berrisch, J., & Ziel, F. (2022). Distributional modeling and forecasting of natural gas prices. Journal of Forecasting, 41(6), 1065–1086. +
+ + +Berrisch, J., Pappert, S., Ziel, F., & Arsova, A. (2023). Modeling volatility and dependence of European carbon and energy prices. Finance Research Letters, 52, 103503. +
+ + +Berrisch, J., Narajewski, M., & Ziel, F. (2023). High-resolution peak demand estimation using generalized additive models and deep neural networks. Energy and AI, 13, 100236. +
+ + +Berrisch, J. (2025). rcpptimer: Rcpp Tic-Toc Timer with OpenMP Support. arXiv preprint arXiv:2501.15856. +
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + +Berrisch, J., & Ziel, F. (2023). CRPS learning. Journal of Econometrics, 237(2), 105221. +
+ + +Berrisch, J., & Ziel, F. (2024). Multivariate probabilistic CRPS learning with an application to day-ahead electricity prices. International Journal of Forecasting, 40(4), 1568–1586. +
+ + +Hirsch, S., Berrisch, J., & Ziel, F. (2024). Online Distributional Regression. arXiv preprint arXiv:2407.08750. +
+ + +Berrisch, J., & Ziel, F. (2022). Distributional modeling and forecasting of natural gas prices. Journal of Forecasting, 41(6), 1065–1086. +
+ + +Berrisch, J., Pappert, S., Ziel, F., & Arsova, A. (2023). Modeling volatility and dependence of European carbon and energy prices. Finance Research Letters, 52, 103503. +
+ + +Berrisch, J., Narajewski, M., & Ziel, F. (2023). High-resolution peak demand estimation using generalized additive models and deep neural networks. Energy and AI, 13, 100236. +
+ + +Berrisch, J. (2025). rcpptimer: Rcpp Tic-Toc Timer with OpenMP Support. arXiv preprint arXiv:2501.15856. +
+
+
+
+
+

Overview

+
+
+

Online Distributional Regression

+
+ +
+

Distributional Modeling and Forecasting of Natural Gas Prices

+
+
+
+
+ +
+

Reduces estimation time by 2-3 orders of magnitude

+

Maintainins competitive forecasting accuracy

+

Real-World Validation in Energy Markets

+

Python Package ondil on Github and PyPi

+
+ +
+

Forecasting of Day-Ahead and Month-Ahead prices

+

To capture the full distribution of future prices

+

Extensive data analysis from 2011-2020

+

State-Space models, skewed Student’s t distribution

+

Python Package sstudentt on Github and PyPi

+
+
+
+
+

+
+
+
+
+
+
+
+

Overview

+
+
+

High-Resolution Peak Demand Estimation Using Generalized Additive Models and Deep Neural Networks

+
+ +
+

rcpptimer: Rcpp Tic-Toc Timer with OpenMP Support

+
+
+
+

Predict high-resolution electricity peaks using only low-resolution data

+

Combine GAMs and DNN’s for superior accuracy

+

Won Western Power Distribution Competition

+

Won Best-Student-Presentation Award

+
+
+
+
+

+
+
+
+
+
+ +
+

R Package rcpptimer on Github and CRAN

+

Provides Rcpp bindings for cpptimer

+

tic-toc class for timing C++ code

+

Supports nested, overlapping and scoped timers

+

Supports OpenMP parallelism

+
//[[Rcpp::depends(rcpptimer)]]
+#include <rcpptimer.h>
+
+void main(){
+  Rcpp::Timer timer;
+  Rcpp::Timer::ScopedTimer _(timer, "ST");
+
+  timer.tic();
+  // Some more code
+  timer.toc();
+} // ScopedTimer will stop automatically
+
+
+
+
+

CRPS Learning

+

Berrisch, J., & Ziel, F. (2023). Journal of Econometrics, 237(2), 105221.

+
+
+

Introduction

+
+
+

The Idea:

+
    +
  • Combine multiple forecasts instead of choosing one

  • +
  • Combination weights may vary over time, over the distribution or both

  • +
+

2 Popular options for combining distributions:

+
    +
  • Combining across quantiles (this paper) +
      +
    • Horizontal aggregation, vincentization
    • +
  • +
  • Combining across probabilities +
      +
    • Vertical aggregation
    • +
  • +
+
+ +
+
+ +
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+

The Framework of Prediction under Expert Advice

+

The sequential framework

+
+
+

Each day, \(t = 1, 2, ... T\)

+
    +
  • The forecaster receives predictions \(\widehat{X}_{t,k}\) from \(K\) experts
  • +
  • The forecaster assings weights \(w_{t,k}\) to each expert
  • +
  • The forecaster calculates her prediction: \[\begin{equation} + \widetilde{X}_{t} = \sum_{k=1}^K w_{t,k} \widehat{X}_{t,k}. + \label{eq_forecast_def} +\end{equation}\]
  • +
  • The realization for \(t\) is observed
  • +
+
+ +
+
    +
  • The experts can be institutions, persons, or models
  • +
  • The forecasts can be point-forecasts (i.e., mean or median) or full predictive distributions
  • +
  • We do not need any assumptions concerning the underlying data
  • +
  • Cesa-Bianchi & Lugosi (2006)
  • +
+
+
+
+

The Regret

+

Weights are updated sequentially according to the past performance of the \(K\) experts.

+

That is, a loss function \(\ell\) is needed. This is used to compute the cumulative regret \(R_{t,k}\)

+

\[\begin{equation} + R_{t,k} = \widetilde{L}_{t} - \widehat{L}_{t,k} = \sum_{i = 1}^t \ell(\widetilde{X}_{i},Y_i) - \ell(\widehat{X}_{i,k},Y_i)\label{eq:regret} +\end{equation}\]

+

The cumulative regret:

+
    +
  • Indicates the predictive accuracy of the expert \(k\) until time \(t\).
  • +
  • Measures how much the forecaster regrets not having followed the expert’s advice
  • +
+

Popular loss functions for point forecasting Gneiting (2011):

+
+
+

\(\ell_2\) loss:

+

\[\begin{equation} + \ell_2(x, y) = | x -y|^2 \label{eq:elltwo} +\end{equation}\]

+

Strictly proper for mean prediction

+
+ +
+

\(\ell_1\) loss:

+

\[\begin{equation} + \ell_1(x, y) = | x -y| \label{eq:ellone} +\end{equation}\]

+

Strictly proper for median predictions

+
+
+ +
+

Optimal Convergence

+


+
+
+

The selection problem

+

\[\begin{equation} + \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\min} \right) \stackrel{t\to \infty}{\rightarrow} a \quad \text{with} \quad a \leq 0. + \label{eq_opt_select} +\end{equation}\] The forecaster is asymptotically not worse than the best expert \(\widehat{\mathcal{R}}_{t,\min}\).

+

The convex aggregation problem

+

\[\begin{equation} + \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\pi} \right) \stackrel{t\to \infty}{\rightarrow} b \quad \text{with} \quad b \leq 0 . + \label{eq_opt_conv} +\end{equation}\] The forecaster is asymptotically not worse than the best convex combination \(\widehat{X}_{t,\pi}\) in hindsight (oracle).

+
+ +
+

Optimal rates with respect to selection \(\eqref{eq_opt_select}\) and convex aggregation \(\eqref{eq_opt_conv}\) Wintenberger (2017):

+

\[\begin{align} + \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\min} \right) & = + \mathcal{O}\left(\frac{\log(K)}{t}\right)\label{eq_optp_select} +\end{align}\]

+

\[\begin{align} + \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\pi} \right) & = + \mathcal{O}\left(\sqrt{\frac{\log(K)}{t}}\right) + \label{eq_optp_conv} +\end{align}\]

+

Algorithms can statisfy both \(\eqref{eq_optp_select}\) and \(\eqref{eq_optp_conv}\) depending on:

+
    +
  • The loss function
  • +
  • Regularity conditions on \(Y_t\) and \(\widehat{X}_{t,k}\)
  • +
  • The weighting scheme
  • +
+
+
+
+

+
+
+

Optimal Convergence

+


+

EWA satisfies optimal selection convergence \(\eqref{eq_optp_select}\) in a deterministic setting if:

+
    +
  • Loss \(\ell\) is exp-concave
  • +
  • Learning-rate \(\eta\) is chosen correctly
  • +
+

Those results can be converted to any stochastic setting Wintenberger (2017).

+

Optimal convex aggregation convergence \(\eqref{eq_optp_conv}\) can be satisfied by applying the kernel-trick:

+

\[\begin{align} +\ell^{\nabla}(x,y) = \ell'(\widetilde{X},y) x +\end{align}\]

+

\(\ell'\) is the subgradient of \(\ell\) at forecast combination \(\widetilde{X}\).

+
+ +
+

Probabilistic Setting

+


+

An appropriate choice:

+

\[\begin{equation*} + \text{CRPS}(F, y) = \int_{\mathbb{R}} {(F(x) - \mathbb{1}\{ x > y \})}^2 dx \label{eq:crps} +\end{equation*}\]

+

It’s strictly proper Gneiting & Raftery (2007).

+

Using the CRPS, we can calculate time-adaptive weights \(w_{t,k}\). However, what if the experts’ performance varies in parts of the distribution?

+

Utilize this relation:

+

\[\begin{equation*} + \text{CRPS}(F, y) = 2 \int_0^{1} \text{QL}_p(F^{-1}(p), y) dp.\label{eq_crps_qs} +\end{equation*}\]

+

… to combine quantiles of the probabilistic forecasts individually using the quantile-loss QL.

+
+
+
+

CRPS Learning Optimality

+
+ +
+
+
+

QL is convex, but not exp-concave

+

Bernstein Online Aggregation (BOA) lets us weaken the exp-concavity condition. It satisfies that there exist a \(C>0\) such that for \(x>0\) it holds that

+

\[\begin{equation} + P\left( \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\pi} \right) \leq C \log(\log(t)) \left(\sqrt{\frac{\log(K)}{t}} + \frac{\log(K)+x}{t}\right) \right) \geq + 1-e^{-x} + \label{eq_boa_opt_conv} +\end{equation}\]

+

Almost optimal w.r.t. convex aggregation \(\eqref{eq_optp_conv}\) Wintenberger (2017).

+

The same algorithm satisfies that there exist a \(C>0\) such that for \(x>0\) it holds that \[\begin{equation} + P\left( \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\min} \right) \leq + C\left(\frac{\log(K)+\log(\log(Gt))+ x}{\alpha t}\right)^{\frac{1}{2-\beta}} \right) \geq + 1-2e^{-x} + \label{eq_boa_opt_select} +\end{equation}\]

+

if \(Y_t\) is bounded, the considered loss \(\ell\) is convex \(G\)-Lipschitz and weak exp-concave in its first coordinate.

+

Almost optimal w.r.t. selection \(\eqref{eq_optp_select}\) Gaillard & Wintenberger (2018).

+

We show that this holds for QL under feasible conditions.

+
+
+
+
+
+

Lemma 1

+

\[\begin{align} + 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\min} + & \leq \widehat{\mathcal{R}}^{\text{CRPS}}_{t,\min} + \label{eq_risk_ql_crps_expert} \\ + 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\pi} + & \leq \widehat{\mathcal{R}}^{\text{CRPS}}_{t,\pi} . + \label{eq_risk_ql_crps_convex} +\end{align}\]

+

Pointwise can outperform constant procedures

+

QL is convex but not exp-concave:

+

Almost optimal convergence w.r.t. convex aggregation \(\eqref{eq_boa_opt_conv}\)

+

For almost optimal congerence w.r.t. selection \(\eqref{eq_boa_opt_select}\) we need to check A1 and A2:

+

QL is Lipschitz continuous:

+

A1 holds

+
+ +
+

A1

+

For some \(G>0\) it holds for all \(x_1,x_2\in \mathbb{R}\) and \(t>0\) that

+

\[ | \ell(x_1, Y_t)-\ell(x_2, Y_t) | \leq G |x_1-x_2|\]

+

A2 For some \(\alpha>0\), \(\beta\in[0,1]\) it holds for all \(x_1,x_2 \in \mathbb{R}\) and \(t>0\) that

+

\[\begin{align*} + \mathbb{E}[ + & \ell(x_1, Y_t)-\ell(x_2, Y_t) | \mathcal{F}_{t-1}] \leq \\ + & \mathbb{E}[ \ell'(x_1, Y_t)(x_1 - x_2) |\mathcal{F}_{t-1}] \\ + & + + \mathbb{E}\left[ \left. \left( \alpha(\ell'(x_1, Y_t)(x_1 - x_2))^{2}\right)^{1/\beta} \right|\mathcal{F}_{t-1}\right] +\end{align*}\]

+

Almost optimal w.r.t. selection \(\eqref{eq_optp_select}\) Gaillard & Wintenberger (2018).

+
+
+
+
+
+

Conditional quantile risk: \(\mathcal{Q}_p(x) = \mathbb{E}[ \text{QL}_p(x, Y_t) | \mathcal{F}_{t-1}]\).

+

convexity properties of \(\mathcal{Q}_p\) depend on the conditional distribution \(Y_t|\mathcal{F}_{t-1}\).

+

Proposition 1

+

Let \(Y\) be a univariate random variable with (Radon-Nikodym) \(\nu\)-density \(f\), then for the second subderivative of the quantile risk \(\mathcal{Q}_p(x) = \mathbb{E}[ \text{QL}_p(x, Y) ]\) of \(Y\) it holds for all \(p\in(0,1)\) that \(\mathcal{Q}_p'' = f.\) Additionally, if \(f\) is a continuous Lebesgue-density with \(f\geq\gamma>0\) for some constant \(\gamma>0\) on its support \(\text{spt}(f)\) then is \(\mathcal{Q}_p\) is \(\gamma\)-strongly convex.

+

Strong convexity with \(\beta=1\) implies weak exp-concavity A2 Gaillard & Wintenberger (2018)

+
+ +
+

A1 and A2 give us almost optimal convergence w.r.t. selection \(\eqref{eq_boa_opt_select}\)

+

Theorem 1

+

The gradient based fully adaptive Bernstein online aggregation (BOAG) applied pointwise for all \(p\in(0,1)\) on \(\text{QL}\) satisfies \(\eqref{eq_boa_opt_conv}\) with minimal CRPS given by

+

\[\widehat{\mathcal{R}}_{t,\pi} = 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\pi}.\]

+

If \(Y_t|\mathcal{F}_{t-1}\) is bounded and has a pdf \(f_t\) satifying \(f_t>\gamma >0\) on its support \(\text{spt}(f_t)\) then \(\ref{eq_boa_opt_select}\) holds with \(\beta=1\) and

+

\[\widehat{\mathcal{R}}_{t,\min} = 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\min}\].

+
+
+
+
+ +
+
+

A Probabilistic Example

+
+
+

Simple Example:

+

\[\begin{align} + Y_t & \sim \mathcal{N}(0,\,1) \\ + \widehat{X}_{t,1} & \sim \widehat{F}_{1} = \mathcal{N}(-1,\,1) \\ + \widehat{X}_{t,2} & \sim \widehat{F}_{2} = \mathcal{N}(3,\,4) + \label{eq:dgp_sim1} +\end{align}\]

+
    +
  • True weights vary over \(p\)
  • +
  • Figures show the ECDF and calculated weights using \(T=25\) realizations
  • +
  • Pointwise solution creates rough estimates
  • +
  • Pointwise is better than constant
  • +
  • Smooth solution is better than pointwise
  • +
+
+ +
+
+ +
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+

The Smoothing Procedures

+
+ +
+
+
+
+

Penalized cubic B-Splines for smoothing weights:

+

Let \(\varphi=(\varphi_1,\ldots, \varphi_L)\) be bounded basis functions on \((0,1)\) Then we approximate \(w_{t,k}\) by

+

\[\begin{align} +w_{t,k}^{\text{smooth}} = \sum_{l=1}^L \beta_l \varphi_l = \beta'\varphi +\end{align}\]

+

with parameter vector \(\beta\). The latter is estimated to penalize \(L_2\)-smoothing which minimizes

+

\[\begin{equation} + \| w_{t,k} - \beta' \varphi \|^2_2 + \lambda \| \mathcal{D}^{d} (\beta' \varphi) \|^2_2 + \label{eq_function_smooth} +\end{equation}\]

+

with differential operator \(\mathcal{D}\)

+

Computation is easy, since we have an analytical solution

+
+ +
+

We receive the constant solution for high values of \(\lambda\) when setting \(d=1\)

+
+ +
+
+
+
+
+
+

Represent weights as linear combinations of bounded basis functions:

+

\[\begin{equation} + w_{t,k} = \sum_{l=1}^L \beta_{t,k,l} \varphi_l = \boldsymbol \beta_{t,k}' \boldsymbol \varphi +\end{equation}\]

+

A popular choice are are B-Splines as local basis functions

+

\(\boldsymbol \beta_{t,k}\) is calculated using a reduced regret matrix:

+

\[\begin{equation} + \underbrace{\boldsymbol r_{t}}_{\text{LxK}} = \frac{L}{P} \underbrace{\boldsymbol B'}_{\text{LxP}} \underbrace{\left({\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widetilde{\boldsymbol X}_{t},Y_t)- {\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widehat{\boldsymbol X}_{t},Y_t)\right)}_{\text{PxK}} +\end{equation}\]

+

\(\boldsymbol r_{t}\) is transformed from PxK to LxK

+

If \(L = P\) it holds that \(\boldsymbol \varphi = \boldsymbol{I}\) For \(L = 1\) we receive constant weights

+
+ +
+

Weights converge to the constant solution if \(L\rightarrow 1\)

+
+ +
+
+
+
+
+
+
+

The Proposed CRPS-Learning Algorithm

+


+
+
+
+

Initialization:

+

Array of expert predicitons: \(\widehat{X}_{t,p,k}\)

+

Vector of Prediction targets: \(Y_t\)

+

Starting Weights: \(\boldsymbol w_0=(w_{0,1},\ldots, w_{0,K})\)

+

Penalization parameter: \(\lambda\geq 0\)

+

B-spline and penalty matrices \(\boldsymbol B\) and \(\boldsymbol D\) on \(\mathcal{P}= (p_1,\ldots,p_M)\)

+

Hat matrix: \[\boldsymbol{\mathcal{H}} = \boldsymbol B(\boldsymbol B'\boldsymbol B+ \lambda (\alpha \boldsymbol D_1'\boldsymbol D_1 + (1-\alpha) \boldsymbol D_2'\boldsymbol D_2))^{-1} \boldsymbol B'\]

+

Cumulative Regret: \(R_{0,k} = 0\)

+

Range parameter: \(E_{0,k}=0\)

+

Starting pseudo-weights: \(\boldsymbol \beta_0 = \boldsymbol B^{\text{pinv}}\boldsymbol w_0(\boldsymbol{\mathcal{P}})\)

+
+ +
+

Core:

+

for( t in 1:T ) {

+

    \(\widetilde{\boldsymbol X}_{t} = \text{Sort}\left( \boldsymbol w_{t-1}'(\boldsymbol P) \widehat{\boldsymbol X}_{t} \right)\) # Prediction

+

    \(\boldsymbol r_{t} = \frac{L}{M} \boldsymbol B' \left({\boldsymbol{QL}}_{\boldsymbol{\mathcal P}}^{\nabla}(\widetilde{\boldsymbol X}_{t},Y_t)- {\boldsymbol{QL}}_{\boldsymbol{\mathcal P}}^{\nabla}(\widehat{\boldsymbol X}_{t},Y_t)\right)\)

+

    \(\boldsymbol E_{t} = \max(\boldsymbol E_{t-1}, \boldsymbol r_{t}^+ + \boldsymbol r_{t}^-)\)

+

    \(\boldsymbol V_{t} = \boldsymbol V_{t-1} + \boldsymbol r_{t}^{ \odot 2}\)

+

    \(\boldsymbol \eta_{t} =\min\left( \left(-\log(\boldsymbol \beta_{0}) \odot \boldsymbol V_{t}^{\odot -1} \right)^{\odot\frac{1}{2}} , \frac{1}{2}\boldsymbol E_{t}^{\odot-1}\right)\)

+

    \(\boldsymbol R_{t} = \boldsymbol R_{t-1}+ \boldsymbol r_{t} \odot \left( \boldsymbol 1 - \boldsymbol \eta_{t} \odot \boldsymbol r_{t} \right)/2 + \boldsymbol E_{t} \odot \mathbb{1}\{-2\boldsymbol \eta_{t}\odot \boldsymbol r_{t} > 1\}\)

+

    \(\boldsymbol \beta_{t} = K \boldsymbol \beta_{0} \odot \boldsymbol {SoftMax}\left( - \boldsymbol \eta_{t} \odot \boldsymbol R_{t} + \log( \boldsymbol \eta_{t}) \right)\)

+

    \(\boldsymbol w_{t}(\boldsymbol P) = \underbrace{\boldsymbol B(\boldsymbol B'\boldsymbol B+ \lambda (\alpha \boldsymbol D_1'\boldsymbol D_1 + (1-\alpha) \boldsymbol D_2'\boldsymbol D_2))^{-1} \boldsymbol B'}_{\boldsymbol{\mathcal{H}}} \boldsymbol B \boldsymbol \beta_{t}\)

+

}

+
+
+
+
+

Simulation Study

+
+ +
+
+
+
+

Data Generating Process of the simple probabilistic example:

+

\[\begin{align*} + Y_t &\sim \mathcal{N}(0,\,1)\\ + \widehat{X}_{t,1} &\sim \widehat{F}_{1}=\mathcal{N}(-1,\,1) \\ + \widehat{X}_{t,2} &\sim \widehat{F}_{2}=\mathcal{N}(3,\,4) +\end{align*}\]

+
    +
  • Constant solution \(\lambda \rightarrow \infty\)
  • +
  • Pointwise Solution of the proposed BOAG
  • +
  • Smoothed Solution of the proposed BOAG +
      +
    • Weights are smoothed during learning
    • +
    • Smooth weights are used to calculate Regret, adjust weights, etc.
    • +
  • +
+
+ +
+
+ +
+
+

Deviation from best attainable QL (1000 runs).

+

+
+
+

CRPS Values for different \(\lambda\) (1000 runs)

+

+
+
+

CRPS for different number of knots (1000 runs)

+

+
+
+
+
+
+
+

The same simulation carried out for different algorithms (1000 runs):

+
+ +
+
+
+
+
+

New DGP:

+

\[\begin{align*} + Y_t &\sim \mathcal{N}\left(\frac{\sin(0.005 \pi t )}{2},\,1\right) \\ + \widehat{X}_{t,1} &\sim \widehat{F}_{1} = \mathcal{N}(-1,\,1) \\ + \widehat{X}_{t,2} &\sim \widehat{F}_{2} = \mathcal{N}(3,\,4) +\end{align*}\]

+

Changing optimal weights

+

Single run example depicted aside

+

No forgetting leads to long-term constant weights

+
+ +
+
+ +
+

 

+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+

Application Study

+
+ +
+
+
+
+
+

Data:

+
    +
  • Forecasting European emission allowances (EUA)
  • +
  • Daily month-ahead prices
  • +
  • Jan 13 - Dec 20 (Phase III, 2092 Obs)
  • +
  • Rolling Window (length 250 ~ 1 year)
  • +
+

Combination methods:

+
    +
  • Online +
      +
    • Naive, BOAG, EWAG, ML-PolyG, BMA
    • +
  • +
  • Batch +
      +
    • QRlin, QRconv
    • +
  • +
+
+
+ +
+

Tuning paramter grids:

+
    +
  • Smoothing Penalty: \(\Lambda= \{0\}\cup \{2^x|x\in \{-4,-3.5,\ldots,12\}\}\)
  • +
  • Learning Rates: \(\mathcal{E}= \{2^x|x\in \{-1,-0.5,\ldots,9\}\}\)
  • +
+
+
+
+
+

+
+
+
+
+
+
+
+
+

Simple exponential smoothing with additive errors (ETS-ANN):

+

\[\begin{align*} +Y_{t} = l_{t-1} + \varepsilon_t \quad \text{with} \quad l_t = l_{t-1} + \alpha \varepsilon_t \quad \text{and} \quad \varepsilon_t \sim \mathcal{N}(0,\sigma^2) +\end{align*}\]

+

Quantile regression (QuantReg): For each \(p \in \mathcal{P}\) we assume:

+

\[\begin{align*} +F^{-1}_{Y_t}(p) = \beta_{p,0} + \beta_{p,1} Y_{t-1} + \beta_{p,2} |Y_{t-1}-Y_{t-2}| +\end{align*}\]

+

ARIMA(1,0,1)-GARCH(1,1) with Gaussian errors (ARMA-GARCH):

+

\[\begin{align*} +Y_{t} = \mu + \phi(Y_{t-1}-\mu) + \theta \varepsilon_{t-1} + \varepsilon_t \quad \text{with} \quad \varepsilon_t = \sigma_t Z, \quad \sigma_t^2 = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2 \quad \text{and} \quad Z_t \sim \mathcal{N}(0,1) +\end{align*}\]

+

ARIMA(0,1,0)-I-EGARCH(1,1) with Gaussian errors (I-EGARCH):

+

\[\begin{align*} +Y_{t} = \mu + Y_{t-1} + \varepsilon_t \quad \text{with} \quad \varepsilon_t = \sigma_t Z, \quad \log(\sigma_t^2) = \omega + \alpha Z_{t-1}+ \gamma (|Z_{t-1}|-\mathbb{E}|Z_{t-1}|) + \beta \log(\sigma_{t-1}^2) \quad \text{and} \quad Z_t \sim \mathcal{N}(0,1) +\end{align*}\]

+

ARIMA(0,1,0)-GARCH(1,1) with student-t errors (I-GARCHt):

+

\[\begin{align*} +Y_{t} = \mu + Y_{t-1} + \varepsilon_t \quad \text{with} \quad \varepsilon_t = \sigma_t Z, \quad \sigma_t^2 = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2 \quad \text{and} \quad Z_t \sim t(0,1, \nu) +\end{align*}\]

+
+
+
+
+ +
+
+
+ + + + + + + + + + + + + + + + + + + +
ETSQuantRegARMA-GARCHI-EGARCHI-GARCHt
2.101(>.999)1.358(>.999)0.52(0.993)0.511(0.999)-0.037(0.406)
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
BOAGEWAGML-PolyGBMAQRlinQRconv
Pointwise-0.170(0.055)-0.089(0.175)-0.141(0.112)0.032(0.771)3.482(>.999)-0.019(0.309)
B-Constant-0.118(0.146)-0.049(0.305)-0.090(0.218)0.038(0.834)4.002(>.999)0.539(0.996)
P-Constant-0.138(0.020)-0.070(0.137)-0.133(0.026)0.039(0.851)5.275(>.999)0.009(0.683)
B-Smooth-0.173(0.062)-0.065(0.276)-0.141(0.118)-0.042(0.386)--
P-Smooth-0.182(0.039)-0.107(0.121)-0.160(0.065)0.040(0.804)3.495(>.999)-0.012(0.369)
+
+
+

CRPS difference to Naive. Negative values correspond to better performance (the best value is bold).
Additionally, we show the p-value of the DM-test, testing against Naive. The cells are colored with respect to their values (the greener better).

+
+
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+
+
+
+

Multivariate Probabilistic CRPS Learning with an Application to Day-Ahead Electricity Prices

+

Berrisch, J., & Ziel, F. (2024). International Journal of Forecasting, 40(4), 1568-1586.

+
+
+

Multivariate CRPS Learning

+
+
+

We extend the B-Smooth and P-Smooth procedures to the multivariate setting:

+
+ +
+
+

Let \(\boldsymbol{\psi}^{\text{mv}}=(\psi_1,\ldots, \psi_{D})\) and \(\boldsymbol{\psi}^{\text{pr}}=(\psi_1,\ldots, \psi_{P})\) be two sets of bounded basis functions on \((0,1)\):

+

\[\begin{equation*} + \boldsymbol w_{t,k} = \boldsymbol{\psi}^{\text{mv}} \boldsymbol{b}_{t,k} {\boldsymbol{\psi}^{pr}}' +\end{equation*}\]

+

with parameter matix \(\boldsymbol b_{t,k}\). The latter is estimated to penalize \(L_2\)-smoothing which minimizes

+

\[\begin{align} + & \| \boldsymbol{\beta}_{t,d, k}' \boldsymbol{\varphi}^{\text{pr}} - \boldsymbol b_{t, d, k}' \boldsymbol{\psi}^{\text{pr}} \|^2_2 + \lambda^{\text{pr}} \| \mathcal{D}_{q} (\boldsymbol b_{t, d, k}' \boldsymbol{\psi}^{\text{pr}}) \|^2_2 + \nonumber \\ + & \| \boldsymbol{\beta}_{t, p, k}' \boldsymbol{\varphi}^{\text{mv}} - \boldsymbol b_{t, p, k}' \boldsymbol{\psi}^{\text{mv}} \|^2_2 + \lambda^{\text{mv}} \| \mathcal{D}_{q} (\boldsymbol b_{t, p, k}' \boldsymbol{\psi}^{\text{mv}}) \|^2_2 \nonumber +\end{align}\]

+

with differential operator \(\mathcal{D}_q\) of order \(q\)

+

We have an analytical solution.

+
+
+

Linear combinations of bounded basis functions:

+

\[\begin{equation} + \underbrace{\boldsymbol w_{t,k}}_{D \text{ x } P} = \sum_{j=1}^{\widetilde D} \sum_{l=1}^{\widetilde P} \beta_{t,j,l,k} \varphi^{\text{mv}}_{j} \varphi^{\text{pr}}_{l} = \underbrace{\boldsymbol \varphi^{\text{mv}}}_{D\text{ x }\widetilde D} \boldsymbol \beta_{t,k} \underbrace{{\boldsymbol\varphi^{\text{pr}}}'}_{\widetilde P \text{ x }P} \nonumber +\end{equation}\]

+

A popular choice: B-Splines

+

\(\boldsymbol \beta_{t,k}\) is calculated using a reduced regret matrix:

+

\(\underbrace{\boldsymbol r_{t,k}}_{\widetilde P \times \widetilde D} = \boldsymbol \varphi^{\text{pr}} \underbrace{\left({\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widetilde{\boldsymbol X}_{t},Y_t)- {\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widehat{\boldsymbol X}_{t},Y_t)\right)}_{\text{PxD}}\boldsymbol \varphi^{\text{mv}}\)

+

If \(\widetilde P = P\) it holds that \(\boldsymbol \varphi^{pr} = \boldsymbol{I}\) (pointwise)

+

For \(\widetilde P = 1\) we receive constant weights

+
+
+
+
+ +
+
+
+
+
+

+
+
+
+
+
+
+
+

Application

+
+
+

Data

+
    +
  • Day-Ahead electricity price forecasts from Marcjasz et al. (2023)
  • +
  • Produced using probabilistic neural networks
  • +
  • 24-dimensional distributional forecasts
  • +
  • Distribution assumptions: JSU and Normal
  • +
  • 8 experts (4 JSU, 4 Normal)
  • +
  • 27th Dec. 2018 to 31st Dec. 2020 (736 days)
  • +
  • We extract 99 quantiles (percentiles)
  • +
+
+ +
+

Setup

+

Evaluation: Exclude first 182 observations

+

Extensions: Penalized smoothing | Forgetting

+

Tuning strategies:

+
    +
  • Bayesian Fix +
      +
    • Sophisticated Baesian Search algorithm
    • +
  • +
  • Online +
      +
    • Dynamic based on past performance
    • +
  • +
  • Bayesian Online +
      +
    • First Bayesian Fix then Online
    • +
  • +
+

Computation Time: ~30 Minutes

+
+
+
+

Results

+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + +
JSU1JSU2JSU3JSU4Norm1Norm2Norm3Norm4Naive
1.4871.4441.4991.3741.4141.5351.4201.4221.295
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DescriptionParameter TuningBOAML-PolyEWA
Constant1.29331.29661.3188
Pointwise1.29361.30101.3101
FTL1.37521.36921.3863
B Constant Pr1.29361.30001.3432
B Constant Mv1.29181.29451.3076
ForgetBayesian Fix1.29301.29561.3096
FullBayesian Fix1.29051.29021.2870.
Smooth.forgetBayesian Fix1.29111.29121.2869.
SmoothBayesian Fix1.29181.29171.2873.
ForgetBayesian Online1.2855**1.29611.3098
FullBayesian Online1.29191.2873.1.2873.
Smooth.forgetBayesian Online1.2845**1.2862*1.2864.
SmoothBayesian Online1.29181.29181.2874.
ForgetSampling Online1.2855**1.29611.3114
FullSampling Online1.28861.2861*1.2873.
Smooth.forgetSampling Online1.2845***1.2867*1.2866.
SmoothSampling Online1.29181.29171.2877.
+
+
+
+ Coloring w.r.t. test statistic: + <-5 + -4 + -3 + -2 + -1 + 0 + 1 + 2 + 3 + 4 + >5 +
+ +
+ Significance denoted by: . p < 0.1; * p < 0.05; ** p < 0.01; *** p < 0.001; +
+
+


+
+ +
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+

Results

+
+ +
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+

Non-Equidistant Knots

+
+ +
+
+
+ +
+
+ +
+
+
+
+ +
+
+ +
+
+
+
+ +
+
+
+ +
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+

Non-central beta distribution Johnson et al. (1995):

+
+

\[\begin{equation*} + \mathcal{B}(x, a, b, c) = \sum_{j=0}^{\infty} e^{-c/2} \frac{\left( \frac{c}{2} \right)^j}{j!} I_x \left( a + j , b \right) +\end{equation*}\]

+
+
+
+
+
+

+
+
+
+
+

Penalty and \(\lambda\) need to be adjusted accordingly Li & Cao (2022)

+
+ +
+

Using non equidistant knots in profoc is straightforward:

+
+
mod <- online(
+  y = Y,
+  experts = experts,
+  tau = 1:99 / 100,
+  b_smooth_pr = list(
+    knots = 9,
+    mu = 0.3,
+    sigma = 1,
+    nonc = 0,
+    tailweight = 1,
+    deg = 3
+  )
+)
+
+

Basis specification b_smooth_pr is internally passed to make_basis_mats().

+

Profoc adjusts penatly and \(\lambda\)

+
+
+
+
+
+
+

Wrap-Up

+
+
+

Potential Downsides:

+
    +
  • Pointwise optimization can induce quantile crossing +
      +
    • Can be solved by sorting the predictions
    • +
  • +
+

Important:

+
    +
  • The choice of the learning rate is crucial
  • +
  • The loss function has to meet certain criteria
  • +
+

Upsides:

+
    +
  • Pointwise learning outperforms the Naive solution significantly
  • +
  • Online learning is much faster than batch methods
  • +
  • Smoothing further improves the predictive performance
  • +
  • Asymptotically not worse than the best convex combination
  • +
+
+ +
+

The profoc R Package:

+
    +
  • Implements all algorithms discussed above
  • +
  • Is written using RcppArmadillo its fast
  • +
  • Accepts vectors for most parameters +
      +
    • The best parameter combination is chosen online
    • +
  • +
  • Implements +
      +
    • Forgetting, Fixed Share
    • +
    • Different loss functions + gradients
    • +
  • +
+

Pubications:

+

Berrisch, J., & Ziel, F. (2023). CRPS learning. Journal of Econometrics, 237(2), 105221.

+

Berrisch, J., & Ziel, F. (2024). Multivariate probabilistic CRPS learning with an application to day-ahead electricity prices. International Journal of Forecasting, 40(4), 1568-1586.

+
+
+
+
+

Modeling Volatility and Dependence of European Carbon and Energy Prices

+

Berrisch, J., Pappert, S., Ziel, F., & Arsova, A. (2023). Finance Research Letters, 52, 103503.

+
+
+

 

+
+
+

Motivation

+

Understanding European Allowances (EUA) dynamics is important for several fields:

+

Portfolio & Risk Management,

+

Sustainability Planing

+

Political decisions

+

EUA prices are obviously connected to the energy market

+

How can the dynamics be characterized?

+

Several Questions arise:

+
    +
  • Data (Pre)processing
  • +
  • Modeling Approach
  • +
  • Evaluation
  • +
+
+ +
+

Data

+

EUA, natural gas, Brent crude oil, coal

+

March 15, 2010, until October 14, 2022

+

Data was normalized w.r.t. \(\text{CO}_2\) emissions

+

Emission-adjusted prices reflects one tonne of \(\text{CO}_2\)

+

We adjusted for inflation by Eurostat’s HICP, excluding energy

+

Log transformation of the data to stabilize the variance

+

ADF Test: All series are stationary in first differences

+

Johansen’s likelihood ratio trace test suggests two cointegrating relationships (levels)

+

Johansen’s likelihood ratio trace test suggests no cointegrating relationships (logs)

+
+
+
+

Data

+ +
+
+

Modeling Approach: Overview

+


+

VECM: Vector Error Correction Model

+
    +
  • Modeling the expectaion
  • +
  • Captures the long-run cointegrating relationship
  • +
  • Different cointegrating ranks, including rank zero (no cointegration)
  • +
+

GARCH: Generalized Autoregressive Conditional Heteroscedasticity

+
    +
  • Captures dynamics in conditional variance
  • +
+

Copula: Captures the dependence structure

+
    +
  • Captures: conditional cross-sectional dependencies
  • +
  • Dependence allowed to vary over time
  • +
+
+
+

Modeling Approach: Notation

+


+
+
+
    +
  • Let \(\boldsymbol{X}_t\) be a \(K\)-dimensional vector at time \(t\)
  • +
  • The forecasting target: +
      +
    • Conditional joint distribution
    • +
    • \(F_{\boldsymbol{X}_t|\mathcal{F}_{t-1}}\)
    • +
    • \(\mathcal{F}_{t}\) is the sigma field generated by all information available up to and including time \(t\)
    • +
  • +
+

Sklars theorem: decompose target into - marginal distributions: \(F_{X_{k,t}|\mathcal{F}_{t-1}}\) for \(k=1,\ldots, K\), and - copula function: \(C_{\boldsymbol{U}_{t}|\mathcal{F}_{t - 1}}\)

+
+ +
+

Let \(\boldsymbol{x}_t= (x_{1,t},\ldots, x_{K,t})^\intercal\) be the realized values

+

It holds that:

+

\[\begin{align} + F_{\boldsymbol{X}_t|\mathcal{F}_{t-1}}(\boldsymbol{x}_t) = C_{\boldsymbol{U}_{t}|\mathcal{F}_{t - 1}}(\boldsymbol{u}_t) \nonumber +\end{align}\]

+

with: \(\boldsymbol{u}_t =(u_{1,t},\ldots, u_{K,t})^\intercal\), \(u_{k,t} = F_{X_{k,t}|\mathcal{F}_{t-1}}(x_{k,t})\)

+

For brewity we drop the conditioning on \(\mathcal{F}_{t-1}\).

+

The model can be specified as follows

+

\[\begin{align} + F(\boldsymbol{x}_t) = C \left[\mathbf{F}(\boldsymbol{x}_t; \boldsymbol{\mu}_t, \boldsymbol{ \sigma }_{t}^2, \boldsymbol{\nu}, \boldsymbol{\lambda}); \Xi_t, \Theta\right] \nonumber +\end{align}\]

+

\(\Xi_{t}\) denotes time-varying dependence parameters \(\Theta\) denotes time-invariant dependence parameters

+

We take \(C\) as the \(t\)-copula

+
+
+
+

Modeling Approach: Mean and Variance

+


+
+
+

Individual marginal distributions:

+

\[\mathbf{F} = (F_1, \ldots, F_K)^{\intercal}\]

+

Generalized non-central t-distributions

+
    +
  • To account for heavy tails
  • +
  • Time varying +
      +
    • expectation: \(\boldsymbol{\mu}_t = (\mu_{1,t}, \ldots, \mu_{K,t})^{\intercal}\)
    • +
    • variance: \(\boldsymbol{\sigma}_{t}^2 = (\sigma_{1,t}^2, \ldots, \sigma_{K,t}^2)^{\intercal}\)
    • +
  • +
  • Time invariant +
      +
    • degrees of freedom: \(\boldsymbol{\nu} = (\nu_1, \ldots, \nu_K)^{\intercal}\)
    • +
    • noncentrality: \(\boldsymbol{\lambda} = (\lambda_1, \ldots, \lambda_K)^{\intercal}\)
    • +
  • +
+
+ +
+

VECM Model

+

\[\begin{align} + \Delta \boldsymbol{\mu}_t = \Pi \boldsymbol{x}_{t-1} + \Gamma \Delta \boldsymbol{x}_{t-1} \nonumber +\end{align}\]

+

where \(\Pi = \alpha \beta^{\intercal}\) is the cointegrating matrix of rank \(r\), \(0 \leq r\leq K\).

+

GARCH model

+

\[\begin{align} + \sigma_{i,t}^2 = & \omega_i + \alpha^+_{i} (\epsilon_{i,t-1}^+)^2 + \alpha^-_{i} (\epsilon_{i,t-1}^-)^2 + \beta_i \sigma_{i,t-1}^2 \nonumber +\end{align}\]

+

where \(\epsilon_{i,t-1}^+ = \max\{\epsilon_{i,t-1}, 0\}\)

+

Separate coefficients for positive and negative innovations to capture leverage effects.

+
+
+
+

Modeling Approach: Dependence

+


+
+
+

Time-varying dependence parameters

+

\[\begin{align*} + \Xi_{t} = & \Lambda\left(\boldsymbol{\xi}_{t}\right) + \\ + \xi_{ij,t} = & \eta_{0,ij} + \eta_{1,ij} \xi_{ij,t-1} + \eta_{2,ij} z_{i,t-1} z_{j,t-1}, +\end{align*}\]

+

\(\xi_{ij,t}\) is a latent process

+

\(z_{i,t}\) denotes the \(i\)-th standardized residual from time series \(i\) at time point \(t\)

+

\(\Lambda(\cdot)\) is a link function - ensures that \(\Xi_{t}\) is a valid variance covariance matrix - ensures that \(\Xi_{t}\) does not exceed its support space and remains semi-positive definite

+
+ +
+

Maximum Likelihood Estimation

+

All parameters can be estimated jointly. Using conditional independence: \[\begin{align*} + L = f_{X_1} \prod_{i=2}^T f_{X_i|\mathcal{F}_{i-1}}, +\end{align*}\] with multivariate conditional density: \[\begin{align*} + f_{\mathbf{X}_t}(\mathbf{x}_t | \mathcal{F}_{t-1}) = c\left[\mathbf{F}(\mathbf{x}_t;\boldsymbol{\mu}_t, \boldsymbol{\sigma}_{t}^2, \boldsymbol{\nu}, + \boldsymbol{\lambda});\Xi_t, \Theta\right] \cdot \\ \prod_{i=1}^K f_{X_{i,t}}(\mathbf{x}_t;\boldsymbol{\mu}_t, \boldsymbol{\sigma}_{t}^2, \boldsymbol{\nu}, \boldsymbol{\lambda}) +\end{align*}\] The copula density \(c\) can be derived analytically.

+
+
+
+

Study Design and Evaluation

+


+
+
+

Rolling-window forecasting study

+
    +
  • 3257 observations total
  • +
  • Window size: 1000 days (~ four years)
  • +
  • Forecasting 30-steps-ahead
  • +
+

=> 2227 potential starting points

+

We sample 250 to reduce computational cost

+

We draw \(2^{12}= 2048\) trajectories from the joint predictive distribution

+
+ +
+

Evaluation

+

Forecasts are evaluated by the energy score (ES)

+

\[\begin{align*} + \text{ES}_t(F, \mathbf{x}_t) = \mathbb{E}_{F} \left(||\tilde{\mathbf{X}}_t - \mathbf{x}_t||_2\right) - \\ \frac{1}{2} \mathbb{E}_F \left(||\tilde{\mathbf{X}}_t - \tilde{\mathbf{X}}_t'||_2 \right) +\end{align*}\]

+

where \(\mathbf{x}_t\) is the observed \(K\)-dimensional realization and \(\tilde{\mathbf{X}}_t\), respectively \(\tilde{\mathbf{X}}_t'\) are independent random vectors distributed according to \(F\)

+

For univariate cases the Energy Score becomes the Continuous Ranked Probability Score (CRPS)

+
+
+
+

Results

+
+ +
+
+
+
+

Relative improvement in ES compared to \(\text{RW}^{\sigma, \rho}\)

+

Cellcolor: w.r.t. test statistic of Diebold-Mariano test (testing wether the model outperformes the benchmark, greener = better).

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Model\(\text{ES}^{\text{All}}_{1-30}\)\(\text{ES}^{\text{EUA}}_{1-30}\)\(\text{ES}^{\text{Oil}}_{1-30}\)\(\text{ES}^{\text{NGas}}_{1-30}\)\(\text{ES}^{\text{Coal}}_{1-30}\)\(\text{ES}^{\text{All}}_{1}\)\(\text{ES}^{\text{All}}_{5}\)\(\text{ES}^{\text{All}}_{30}\)
\(\textrm{RW}^{\sigma, \rho}_{}\)161.9610.0637.94146.7313.225.5613.2834.29
\(\textrm{RW}^{\sigma_t, \rho_t}_{}\)9.403.75-0.4111.394.1310.349.107.59
\(\textrm{RW}^{\sigma, \rho_t}_{\textrm{ncp}, \textrm{log}}\)12.046.16-0.5614.337.359.229.8210.02
\(\textrm{RW}^{\sigma, \rho}_{\textrm{log}}\)12.106.25-0.5914.447.319.049.669.91
\(\textrm{VECM}^{\textrm{r0}, \sigma_t, \rho_t}_{\textrm{lev}, \textrm{ncp}}\)9.68-0.720.3211.743.7010.8210.508.21
\(\textrm{VECM}^{\textrm{r0}, \sigma, \rho_t}_{\textrm{log}}\)12.156.10-0.7014.577.808.059.9910.04
\(\textrm{ETS}^{\sigma}\)9.945.750.0813.057.836.967.746.21
\(\textrm{ETS}^{\sigma}_{\textrm{log}}\)8.127.80-0.5111.178.545.056.142.66
\(\textrm{VES}^{\sigma}\)5.50-4.43-3.226.294.68-25.99-2.423.07
\(\textrm{VES}^{\sigma}_{\textrm{log}}\)7.683.31-4.349.078.30-22.111.074.32
+
+
+ +
+
    +
  • Benchmarks: +
      +
    • \(\text{RW}^{\sigma, \rho}\): Random walk with constant volatility and correlation
    • +
    • Univariate \(\text{ETS}^{\sigma}\) with constant volatility
    • +
    • Vector ETS \(VES^{\sigma}\) with constant volatility
    • +
  • +
  • Heteroscedasticity is a main driver of ES
  • +
  • The VECM model without cointegration (essentially a VAR) is the best performing model in terms of ES overall
  • +
  • For EUA, the ETS Benchmark is the best performing model in terms of ES
  • +
+
+
+
+
+
+
    +
  • CRPS solely evaluates the marginal distributions +
      +
    • The cross-sectional dependence is ignored
    • +
  • +
  • VES models deliver poor performance in short horizons
  • +
  • For Oil prices the RW Benchmark can’t be oupterformed 30 steps ahead
  • +
  • Both VECM models generally deliver good performance
  • +
+
+ +
+

Improvement in CRPS of selected models relative to \(\textrm{RW}^{\sigma, \rho}_{}\) in % (higher = better). Colored according to the test statistic of a DM-Test comparing to \(\textrm{RW}^{\sigma, \rho}_{}\) (greener means lower test statistic i.e., better performance compared to \(\textrm{RW}^{\sigma, \rho}_{}\)).

+ +++++++++++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+EUA +
+Oil +
+NGas +
+Coal +
ModelH1H5H30H1H5H30H1H5H30H1H5H30
\(\textrm{RW}^{\sigma, \rho}_{}\)0.40.92.11.53.49.14.711.629.80.30.92.8
\(\textrm{RW}^{\sigma_t, \rho_t}_{}\)5.66.02.82.12.7-0.812.610.59.610.76.52.1
\(\textrm{RW}^{\sigma, \rho_t}_{\textrm{ncp}, \textrm{log}}\)5.18.75.00.70.8-0.411.411.512.48.07.36.7
\(\textrm{RW}^{\sigma, \rho}_{\textrm{log}}\)4.78.95.20.00.3-0.611.211.412.47.77.56.6
\(\textrm{VECM}^{\textrm{r0}, \sigma_t, \rho_t}_{\textrm{lev}, \textrm{ncp}}\)3.60.6-1.62.73.00.013.112.210.411.87.21.5
\(\textrm{VECM}^{\textrm{r0}, \sigma, \rho_t}_{\textrm{log}}\)4.28.95.10.20.4-0.89.911.812.77.87.97.3
\(\textrm{ETS}^{\sigma}\)0.26.85.71.10.9-0.210.911.310.97.56.75.6
\(\textrm{ETS}^{\sigma}_{\textrm{log}}\)1.08.68.00.10.7-0.68.99.47.17.37.86.7
\(\textrm{VES}^{\sigma}\)-38.5-6.4-5.4-33.3-6.1-2.4-26.6-2.63.6-37.5-5.54.7
\(\textrm{VES}^{\sigma}_{\textrm{log}}\)-32.42.81.8-30.4-6.2-3.2-22.01.85.4-27.02.36.4
+
+
+
+
+
+

RMSE measures the performance of the forecasts at their mean

+


+
    +
  • Some models beat the benchmarks at short horizons
  • +
+


+

Conclusion: the Improvements seen before must be attributed to other parts of the multivariate probabilistic predictive distribution

+
+ +
+

Improvement in RMSE score of selected models relative to \(\textrm{RW}^{\sigma, \rho}_{}\) in % (higher = better). Colored according to the test statistic of a DM-Test comparing to \(\textrm{RW}^{\sigma, \rho}_{}\) (greener means lower test statistic i.e., better performance compared to \(\textrm{RW}^{\sigma, \rho}_{}\)).

+ +++++++++++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+EUA +
+Oil +
+NGas +
+Coal +
ModelH1H5H30H1H5H30H1H5H30H1H5H30
\(\textrm{RW}^{\sigma, \rho}_{}\)0.92.05.02.96.416.717.842.885.40.92.97.0
\(\textrm{RW}^{\sigma_t, \rho_t}_{}\)-0.1-0.10.70.0-0.3-0.1-0.20.31.3-0.20.0-1.8
\(\textrm{RW}^{\sigma, \rho_t}_{\textrm{ncp}, \textrm{log}}\)-270.5-154.1-139.90.5-0.5-2.9-0.80.7-1.60.3-31.2-24.5
\(\textrm{RW}^{\sigma, \rho}_{\textrm{log}}\)-705.0-265.4-125.20.60.2-0.2-0.40.1-1.6-0.9-0.3-8.3
\(\textrm{VECM}^{\textrm{r0}, \sigma_t, \rho_t}_{\textrm{lev}, \textrm{ncp}}\)-0.90.20.50.50.20.0-0.40.70.21.40.10.2
\(\textrm{VECM}^{\textrm{r0}, \sigma, \rho_t}_{\textrm{log}}\)-271.5-191.3-114.31.7-12.3-3.6-0.61.6-4.10.0-0.8-6.7
\(\textrm{ETS}^{\sigma}\)-0.30.31.60.70.1-0.10.1-0.10.2-2.4-3.92.5
\(\textrm{ETS}^{\sigma}_{\textrm{log}}\)-1.00.41.60.90.0-0.1-1.9-1.9-13.9-0.3-3.6-1.8
\(\textrm{VES}^{\sigma}\)-37.4-8.9-6.0-27.9-7.4-2.8-27.2-9.5-2.4-41.7-1.21.6
\(\textrm{VES}^{\sigma}_{\textrm{log}}\)-37.6-9.2-7.8-26.8-7.3-3.0-27.0-6.8-3.5-41.2-2.2-0.3
+
+
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+

Conclusion

+
+
+

Accounting for heteroscedasticity or stabilizing the variance via log transformation is crucial for good performance in terms of ES

+
    +
  • Price dynamics emerged way before the russian invaion into ukraine
  • +
  • Linear dependence between the series reacted only right after the invasion
  • +
  • Improvements in forecasting performance is mainly attributed to: +
      +
    • the tails multivariate probabilistic predictive distribution
    • +
    • the dependence structure between the marginals
    • +
  • +
+
+ +
+


+
+ +
+

Berrisch, Pappert, et al. (2023)

+
+
+
+

Contributions

+
+
+

+

+

Theoretical

+

Probabilistic Online Learning:

+

Aggregation
Regression

+

+

+

Practical

+

Applications

+

Energy Commodities
Electricity Prices
Electricity Load

+

+

+

Well received by the academic community:

+

of papers already published

+

104 citations since 2020 (Google Scholar)

+
+ +
+

+

+

Software

+

R Packages:

+

profoc, rcpptimer, dccpp

+

Python Packages:

+

ondil, sstudentt

+

Contributions to other projects:

+

RcppArmadillo
gamlss
NixOS/nixpkgs
OpenPrinting/foomatic-db

+

Awards:

+

Berrisch, J., Narajewski, M., & Ziel, F. (2023):

+

Won Western Power Distribution Competition
Won Best-Student-Presentation Award

+
+
+
+

References

+ +
+
+Berrisch, J. (2025). Rcpptimer: Rcpp tic-toc timer with OpenMP support. arXiv Preprint arXiv:2501.15856, abs/2501.15856. DOI: 10.48550/arXiv.2501.15856 +
+
+Berrisch, J., Narajewski, M., & Ziel, F. (2023). High-resolution peak demand estimation using generalized additive models and deep neural networks. Energy and AI, 13, 100236. DOI: 10.1016/j.egyai.2023.100236 +
+
+Berrisch, J., Pappert, S., Ziel, F., & Arsova, A. (2023). Modeling volatility and dependence of european carbon and energy prices. Finance Research Letters, 52, 103503. DOI: 10.1016/j.frl.2022.103503 +
+
+Berrisch, J., & Ziel, F. (2022). Distributional modeling and forecasting of natural gas prices. Journal of Forecasting, 41(6), 1065–1086. DOI: 10.1002/for.2853 +
+
+Berrisch, J., & Ziel, F. (2023). CRPS learning. Journal of Econometrics, 237(2, Part C), 105221. DOI: 10.1016/j.jeconom.2021.11.008 +
+
+Berrisch, J., & Ziel, F. (2024). Multivariate probabilistic CRPS learning with an application to day-ahead electricity prices. International Journal of Forecasting, 40(4), 1568–1586. DOI: 10.1016/j.ijforecast.2024.01.005 +
+
+Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games (pp. I–XII, 1–394). Cambridge university press. DOI: 10.1017/CBO9780511546921 +
+
+Gaillard, P., & Wintenberger, O. (2018). Efficient online algorithms for fast-rate regret bounds under sparsity. In S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Proceedings of the 32nd international conference on neural information processing systems (pp. 7026–7036). Cornell University. DOI: 10.48550/arxiv.1805.09174 +
+
+Gneiting, T. (2011). Making and evaluating point forecasts. Journal of the American Statistical Association, 106(494), 746–762. DOI: 10.1198/jasa.2011.r10138 +
+
+Gneiting, T., & Raftery, A.E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359–378. DOI: 10.1198/016214506000001437 +
+
+Hirsch, S., Berrisch, J., & Ziel, F. (2024). Online distributional regression. arXiv Preprint arXiv:2407.08750, abs/2407.08750. DOI: 10.48550/arXiv.2407.08750 +
+
+Johnson, N.L., Kotz, S., & Balakrishnan, N. (1995). Continuous univariate distributions, volume 2 (Vol. 289). John wiley & sons. +
+
+Li, Z., & Cao, J. (2022). General p-splines for non-uniform b-splines. arXiv Preprint. DOI: 10.48550/arXiv.2201.06808 +
+
+Marcjasz, G., Narajewski, M., Weron, R., & Ziel, F. (2023). Distributional neural networks for electricity price forecasting. Energy Economics, 125, 106843. DOI: 10.1016/j.eneco.2023.106843 +
+
+Wintenberger, O. (2017). Optimal learning with bernstein online aggregation. Machine Learning, 106(1), 119–141. DOI: 10.1007/s10994-016-5592-6 +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file