Files
PHD-Presentation/index.qmd

2959 lines
78 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Data Science Methods for Forecasting in Energy and Economics"
date: 30 June 2025
author:
- name: Jonathan Berrisch
affiliations:
- ref: hemf
affiliations:
- id: hemf
name: University of Duisburg-Essen, House of Energy, Climate and Finance
format:
revealjs:
embed-resources: false
footer: ""
width: 1280
height: 720
logo: assets/logos_combined.png
theme: [default, sydney.scss, custom.scss]
smaller: true
fig-format: svg
slide-number: true
self-contained-math: true
crossrefs-hover: true
pagetitle: "De-Fence"
html-math-method: mathjax
include-in-header:
- file: custom.html
execute:
daemon: false
highlight-style: github
bibliography: assets/library.bib
csl: apa-old-doi-prefix.csl
---
## Outline
<!--
Render with: quarto preview /home/jonathan/git/PHD-Presentation/25_07_phd_defense/index.qmd --no-browser --port 6074
-->
::: {.hidden}
$$
\newcommand{\A}{{\mathbb A}}
$$
:::
::: {style="font-size: 150%;"}
:::
<br>
[{{< fa bars-staggered >}}]{style="color: #404040;"} &ensp; [Introduction & Research Motivation](#motivation)
[{{< fa bars-staggered >}}]{style="color: #404040;"} &ensp; Overview of the Thesis
[{{< fa table >}}]{style="color: #404040;"} &ensp; Online Learning
[{{< fa circle-nodes >}}]{style="color: #404040;"} &ensp; Probabilistic Forecasting of European Carbon and Energy Prices
[{{< fa lightbulb >}}]{style="color: #404040;"} &ensp; Limitations
[{{< fa binoculars >}}]{style="color: #404040;"} &ensp; Contributions & Outlook
```{r, setup, include=FALSE}
# Compile with: rmarkdown::render("crps_learning.Rmd")
library(latex2exp)
library(ggplot2)
library(dplyr)
library(tidyr)
library(purrr)
library(kableExtra)
library(RefManageR)
knitr::opts_chunk$set(
dev = "svglite" # Use svg figures
)
BibOptions(
check.entries = TRUE,
bib.style = "authoryear",
cite.style = "authoryear",
style = "html",
hyperlink = TRUE,
dashed = FALSE
)
my_bib <- ReadBib("assets/library.bib", check = FALSE)
col_lightgray <- "#e7e7e7"
col_blue <- "#000088"
col_smooth_expost <- "#a7008b"
col_constant <- "#dd9002"
col_optimum <- "#666666"
col_smooth <- "#187a00"
col_pointwise <- "#008790"
col_green <- "#61B94C"
col_orange <- "#ffa600"
col_yellow <- "#FCE135"
```
# CRPS Learning
Berrisch, J., & Ziel, F. (2023). *Journal of Econometrics*, 237(2), 105221.
## Motivation
:::: {.columns}
::: {.column width="48%"}
The Idea:
- Combine multiple forecasts instead of choosing one
- Combination weights may vary over **time**, over the **distribution** or **both**
2 Popular options for combining distributions:
- Combining across quantiles (this paper)
- Horizontal aggregation, vincentization
- Combining across probabilities
- Vertical aggregation
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
::: {.panel-tabset}
## Time
```{r, echo = FALSE, fig.height=6, cache = TRUE}
par(mfrow = c(3, 3), mar = c(2, 2, 2, 2))
set.seed(1)
# Data
X <- matrix(ncol = 3, nrow = 15)
X[, 1] <- seq(from = 8, to = 12, length.out = 15) + 0.25 * rnorm(15)
X[, 2] <- 10 + 0.25 * rnorm(15)
X[, 3] <- seq(from = 12, to = 8, length.out = 15) + 0.25 * rnorm(15)
# Weights
w <- matrix(ncol = 3, nrow = 15)
w[, 1] <- sin(0.1 * 1:15)
w[, 2] <- cos(0.1 * 1:15)
w[, 3] <- seq(from = -2, 0.25, length.out = 15)^2
w <- (w / rowSums(w))
# Vis
plot(X[, 1],
lwd = 4,
type = "l",
ylim = c(8, 12),
xlab = "",
ylab = "",
xaxt = "n",
yaxt = "n",
bty = "n",
col = "#2050f0"
)
plot(w[, 1],
lwd = 4, type = "l",
ylim = c(0, 1),
xlab = "",
ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#2050f0"
)
text(6, 0.5, TeX("$w_1(t)$"), cex = 2, col = "#2050f0")
arrows(13, 0.25, 15, 0.0, , lwd = 4, bty = "n")
plot.new()
plot(X[, 2],
lwd = 4,
type = "l", ylim = c(8, 12),
xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "purple"
)
plot(w[, 2],
lwd = 4, type = "l",
ylim = c(0, 1),
xlab = "",
ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "purple"
)
text(6, 0.6, TeX("$w_2(t)$"), cex = 2, col = "purple")
arrows(13, 0.5, 15, 0.5, , lwd = 4, bty = "n")
plot(rowSums(X * w), lwd = 4, type = "l", xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#298829")
plot(X[, 3],
lwd = 4,
type = "l", ylim = c(8, 12),
xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#e423b4"
)
plot(w[, 3],
lwd = 4, type = "l",
ylim = c(0, 1),
xlab = "",
ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#e423b4"
)
text(6, 0.25, TeX("$w_3(t)$"), cex = 2, col = "#e423b4")
arrows(13, 0.75, 15, 1, , lwd = 4, bty = "n")
```
## Distribution
```{r, echo = FALSE, fig.height=6, cache = TRUE}
par(mfrow = c(3, 3), mar = c(2, 2, 2, 2))
set.seed(1)
# Data
X <- matrix(ncol = 3, nrow = 31)
X[, 1] <- dchisq(0:30, df = 10)
X[, 2] <- dnorm(0:30, mean = 15, sd = 5)
X[, 3] <- dexp(0:30, 0.2)
# Weights
w <- matrix(ncol = 3, nrow = 31)
w[, 1] <- sin(0.05 * 0:30)
w[, 2] <- cos(0.05 * 0:30)
w[, 3] <- seq(from = -2, 0.25, length.out = 31)^2
w <- (w / rowSums(w))
# Vis
plot(X[, 1],
lwd = 4,
type = "l",
xlab = "",
ylab = "",
xaxt = "n",
yaxt = "n",
bty = "n",
col = "#2050f0"
)
plot(X[, 2],
lwd = 4,
type = "l",
xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "purple"
)
plot(X[, 3],
lwd = 4,
type = "l",
xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#e423b4"
)
plot(w[, 1],
lwd = 4, type = "l",
ylim = c(0, 1),
xlab = "",
ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#2050f0"
)
text(12, 0.5, TeX("$w_1(x)$"), cex = 2, col = "#2050f0")
arrows(26, 0.25, 31, 0.0, , lwd = 4, bty = "n")
plot(w[, 2],
lwd = 4, type = "l",
ylim = c(0, 1),
xlab = "",
ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "purple"
)
text(15, 0.5, TeX("$w_2(x)$"), cex = 2, col = "purple")
arrows(15, 0.25, 15, 0, , lwd = 4, bty = "n")
plot(w[, 3],
lwd = 4, type = "l",
ylim = c(0, 1),
xlab = "",
ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#e423b4"
)
text(20, 0.5, TeX("$w_3(x)$"), cex = 2, col = "#e423b4")
arrows(5, 0.25, 0, 0, , lwd = 4, bty = "n")
plot.new()
plot(rowSums(X * w),
lwd = 4, type = "l", xlab = "", ylab = "", xaxt = "n",
yaxt = "n", bty = "n", col = "#298829"
)
```
:::
:::
::::
## The Framework of Prediction under Expert Advice
### The sequential framework
:::: {.columns}
::: {.column width="48%"}
Each day, $t = 1, 2, ... T$
- The **forecaster** receives predictions $\widehat{X}_{t,k}$ from $K$ **experts**
- The **forecaster** assings weights $w_{t,k}$ to each **expert**
- The **forecaster** calculates her prediction:
\begin{equation}
\widetilde{X}_{t} = \sum_{k=1}^K w_{t,k} \widehat{X}_{t,k}.
\label{eq_forecast_def}
\end{equation}
- The realization for $t$ is observedilities
- Vertical aggregation
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
- The experts can be institutions, persons, or models
- The forecasts can be point-forecasts (i.e., mean or median) or full predictive distributions
- We do not need any assumptions concerning the underlying data
- @cesa2006prediction
:::
::::
---
## The Regret
Weights are updated sequentially according to the past performance of the $K$ experts.
That is, a loss function $\ell$ is needed. This is used to compute the **cumulative regret** $R_{t,k}$
\begin{equation}
R_{t,k} = \widetilde{L}_{t} - \widehat{L}_{t,k} = \sum_{i = 1}^t \ell(\widetilde{X}_{i},Y_i) - \ell(\widehat{X}_{i,k},Y_i)\label{eq:regret}
\end{equation}
The cumulative regret:
- Indicates the predictive accuracy of the expert $k$ until time $t$.
- Measures how much the forecaster *regrets* not having followed the expert's advice
Popular loss functions for point forecasting @gneiting2011making:
:::: {.columns}
::: {.column width="48%"}
$\ell_2$ loss:
\begin{equation}
\ell_2(x, y) = | x -y|^2 \label{eq:elltwo}
\end{equation}
Strictly proper for *mean* prediction
:::
::: {.column width="4%"}
:::
::: {.column width="48%"}
$\ell_1$ loss:
\begin{equation}
\ell_1(x, y) = | x -y| \label{eq:ellone}
\end{equation}
Strictly proper for *median* predictions
:::
::::
## Popular Algorithms and the Risk
<br/>
:::: {.columns}
::: {.column width="58%"}
### Popular Aggregation Algorithms
<br/>
#### The naive combination
\begin{equation}
w_{t,k}^{\text{Naive}} = \frac{1}{K}\label{eq:naive_combination}
\end{equation}
#### The exponentially weighted average forecaster (EWA)
\begin{equation}
\begin{aligned}
w_{t,k}^{\text{EWA}} & = \frac{e^{\eta R_{t,k}} }{\sum_{k = 1}^K e^{\eta R_{t,k}}}\\
& =
\frac{e^{-\eta \ell(\widehat{X}_{t,k},Y_t)} w^{\text{EWA}}_{t-1,k} }{\sum_{k = 1}^K e^{-\eta \ell(\widehat{X}_{t,k},Y_t)} w^{\text{EWA}}_{t-1,k}}
\end{aligned}\label{eq:exp_combination}
\end{equation}
Du kannst dann auch easy darauf verweisen: \ref{eq:exp_combination}.
:::
::: {.column width="2%"}
:::
::: {.column width="38%"}
### Optimality
In stochastic settings, the cumulative Risk should be analyzed `r Citet(my_bib, "wintenberger2017optimal")`:
\begin{align}
&\underbrace{\widetilde{\mathcal{R}}_t = \sum_{i=1}^t \mathbb{E}[\ell(\widetilde{X}_{i},Y_i)|\mathcal{F}_{i-1}]}_{\text{Cumulative Risk of Forecaster}} \\
&\underbrace{\widehat{\mathcal{R}}_{t,k} = \sum_{i=1}^t \mathbb{E}[\ell(\widehat{X}_{i,k},Y_i)|\mathcal{F}_{i-1}]}_{\text{Cumulative Risk of Experts}}
\label{eq_def_cumrisk}
\end{align}
:::
::::
## Optimal Convergence
<br/>
:::: {.columns}
::: {.column width="48%"}
### The selection problem
\begin{equation}
\frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\min} \right) \stackrel{t\to \infty}{\rightarrow} a \quad \text{with} \quad a \leq 0.
\label{eq_opt_select}
\end{equation}
The forecaster is asymptotically not worse than the best expert $\widehat{\mathcal{R}}_{t,\min}$.
### The convex aggregation problem
\begin{equation}
\frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\pi} \right) \stackrel{t\to \infty}{\rightarrow} b \quad \text{with} \quad b \leq 0 .
\label{eq_opt_conv}
\end{equation}
The forecaster is asymptotically not worse than the best convex combination $\widehat{X}_{t,\pi}$ in hindsight (**oracle**).
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
Optimal rates with respect to selection \eqref{eq_opt_select} and convex aggregation \eqref{eq_opt_conv} `r Citet(my_bib, "wintenberger2017optimal")`:
\begin{align}
\frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\min} \right) & =
\mathcal{O}\left(\frac{\log(K)}{t}\right)\label{eq_optp_select}
\end{align}
\begin{align}
\frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\pi} \right) & =
\mathcal{O}\left(\sqrt{\frac{\log(K)}{t}}\right)
\label{eq_optp_conv}
\end{align}
Algorithms can statisfy both \eqref{eq_optp_select} and \eqref{eq_optp_conv} depending on:
- The loss function
- Regularity conditions on $Y_t$ and $\widehat{X}_{t,k}$
- The weighting scheme
:::
::::
##
:::: {.columns}
::: {.column width="48%"}
### Optimal Convergence
<br/>
EWA satisfies optimal selection convergence \eqref{eq_optp_select} in a deterministic setting if:
- Loss $\ell$ is exp-concave
- Learning-rate $\eta$ is chosen correctly
Those results can be converted to stochastic iid settings @kakade2008generalization, @gaillard2014second.
Optimal convex aggregation convergence \eqref{eq_optp_conv} can be satisfied by applying the kernel-trick:
\begin{align}
\ell^{\nabla}(x,y) = \ell'(\widetilde{X},y) x
\end{align}
$\ell'$ is the subgradient of $\ell$ at forecast combination $\widetilde{X}$.
:::
::: {.column width="4%"}
:::
::: {.column width="48%"}
### Probabilistic Setting
<br/>
**An appropriate choice:**
\begin{equation*}
\text{CRPS}(F, y) = \int_{\mathbb{R}} {(F(x) - \mathbb{1}\{ x > y \})}^2 dx \label{eq:crps}
\end{equation*}
It's strictly proper @gneiting2007strictly.
Using the CRPS, we can calculate time-adaptive weights $w_{t,k}$. However, what if the experts' performance varies in parts of the distribution?
`r fontawesome::fa("lightbulb", fill = col_yellow)` Utilize this relation:
\begin{equation*}
\text{CRPS}(F, y) = 2 \int_0^{1} \text{QL}_p(F^{-1}(p), y) dp.\label{eq_crps_qs}
\end{equation*}
... to combine quantiles of the probabilistic forecasts individually using the quantile-loss QL.
:::
::::
## CRPS Learning Optimality
::: {.panel-tabset}
## Almost Optimal Convergence
`r fontawesome::fa("exclamation", fill = col_orange)` QL is convex, but not exp-concave `r fontawesome::fa("arrow-right", fill ="#000000")` Bernstein Online Aggregation (BOA) lets us weaken the exp-concavity condition. It satisfies that there exist a $C>0$ such that for $x>0$ it holds that
\begin{equation}
P\left( \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\pi} \right) \leq C \log(\log(t)) \left(\sqrt{\frac{\log(K)}{t}} + \frac{\log(K)+x}{t}\right) \right) \geq
1-e^{-x}
\label{eq_boa_opt_conv}
\end{equation}
`r fontawesome::fa("arrow-right", fill ="#000000")` Almost optimal w.r.t *convex aggregation* \eqref{eq_optp_conv} @wintenberger2017optimal.
The same algorithm satisfies that there exist a $C>0$ such that for $x>0$ it holds that
\begin{equation}
P\left( \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\min} \right) \leq
C\left(\frac{\log(K)+\log(\log(Gt))+ x}{\alpha t}\right)^{\frac{1}{2-\beta}} \right) \geq
1-2e^{-x}
\label{eq_boa_opt_select}
\end{equation}
if $Y_t$ is bounded, the considered loss $\ell$ is convex $G$-Lipschitz and weak exp-concave in its first coordinate.
`r fontawesome::fa("arrow-right", fill ="#000000")` Almost optimal w.r.t *selection* \eqref{eq_optp_select} @gaillard2018efficient.
`r fontawesome::fa("arrow-right", fill ="#000000")` We show that this holds for QL under feasible conditions.
## Conditions + Lemma
:::: {.columns}
::: {.column width="48%"}
**A1**
For some $G>0$ it holds
for all $x_1,x_2\in \mathbb{R}$ and $t>0$ that
$$ | \ell(x_1, Y_t)-\ell(x_2, Y_t) | \leq G |x_1-x_2|$$
**A2** For some $\alpha>0$, $\beta\in[0,1]$ it holds
for all $x_1,x_2 \in \mathbb{R}$ and $t>0$ that
\begin{align*}
\mathbb{E}[
& \ell(x_1, Y_t)-\ell(x_2, Y_t) | \mathcal{F}_{t-1}] \leq \\
& \mathbb{E}[ \ell'(x_1, Y_t)(x_1 - x_2) |\mathcal{F}_{t-1}] \\
& +
\mathbb{E}\left[ \left. \left( \alpha(\ell'(x_1, Y_t)(x_1 - x_2))^{2}\right)^{1/\beta} \right|\mathcal{F}_{t-1}\right]
\end{align*}
`r fontawesome::fa("arrow-right", fill ="#000000")` Almost optimal w.r.t *selection* \eqref{eq_optp_select} @gaillard2018efficient.
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
**Lemma 1**
\begin{align}
2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\min}
& \leq \widehat{\mathcal{R}}^{\text{CRPS}}_{t,\min}
\label{eq_risk_ql_crps_expert} \\
2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\pi}
& \leq \widehat{\mathcal{R}}^{\text{CRPS}}_{t,\pi} .
\label{eq_risk_ql_crps_convex}
\end{align}
Pointwise can outperform constant procedures
QL is convex but not exp-concave:
`r fontawesome::fa("arrow-right")` Almost optimal convergence w.r.t. *convex aggregation* \eqref{eq_boa_opt_conv} `r fontawesome::fa("check", fill ="#00b02f")` </br>
For almost optimal congerence w.r.t. *selection* \eqref{eq_boa_opt_select} we need to check **A1** and **A2**:
QL is Lipschitz continuous:
`r fontawesome::fa("arrow-right")` **A1** holds `r fontawesome::fa("check", fill ="#ffa600")` </br>
:::
::::
## Proposition + Theorem
:::: {.columns}
::: {.column width="48%"}
Conditional quantile risk: $\mathcal{Q}_p(x) = \mathbb{E}[ \text{QL}_p(x, Y_t) | \mathcal{F}_{t-1}]$.
`r fontawesome::fa("arrow-right")` convexity properties of $\mathcal{Q}_p$ depend on the
conditional distribution $Y_t|\mathcal{F}_{t-1}$.
**Proposition 1**
Let $Y$ be a univariate random variable with (Radon-Nikodym) $\nu$-density $f$, then for the second subderivative of the quantile risk
$\mathcal{Q}_p(x) = \mathbb{E}[ \text{QL}_p(x, Y) ]$
of $Y$ it holds for all $p\in(0,1)$ that
$\mathcal{Q}_p'' = f.$
Additionally, if $f$ is a continuous Lebesgue-density with $f\geq\gamma>0$ for some constant $\gamma>0$ on its support $\text{spt}(f)$ then
is $\mathcal{Q}_p$ is $\gamma$-strongly convex.
Strong convexity with $\beta=1$ implies **A2** `r fontawesome::fa("check", fill ="#ffa600")` @gaillard2018efficient
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
`r fontawesome::fa("arrow-right")` **A1** and **A2** give us almost optimal convergence w.r.t. selection \eqref{eq_boa_opt_select} `r fontawesome::fa("check", fill ="#00b02f")` </br>
**Theorem 1**
The gradient based fully adaptive Bernstein online aggregation (BOAG) applied pointwise for all $p\in(0,1)$ on $\text{QL}$ satisfies
\eqref{eq_boa_opt_conv} with minimal CRPS given by
$$\widehat{\mathcal{R}}_{t,\pi} = 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\pi}.$$
If $Y_t|\mathcal{F}_{t-1}$ is bounded
and has a pdf $f_t$ satifying $f_t>\gamma >0$ on its
support $\text{spt}(f_t)$ then \ref{eq_boa_opt_select} holds with $\beta=1$ and
$$\widehat{\mathcal{R}}_{t,\min} = 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\min}$$.
:::
::::
::::
:::: {.notes}
We apply Bernstein Online Aggregation (BOA). It lets us weaken the exp-concavity condition while almost keeping the optimalities \ref{eq_optp_select} and \ref{eq_optp_conv}.
::::
## A Probabilistic Example
:::: {.columns}
::: {.column width="48%"}
Simple Example:
\begin{align}
Y_t & \sim \mathcal{N}(0,\,1) \\
\widehat{X}_{t,1} & \sim \widehat{F}_{1} = \mathcal{N}(-1,\,1) \\
\widehat{X}_{t,2} & \sim \widehat{F}_{2} = \mathcal{N}(3,\,4)
\label{eq:dgp_sim1}
\end{align}
- True weights vary over $p$
- Figures show the ECDF and calculated weights using $T=25$ realizations
- Pointwise solution creates rough estimates
- Pointwise is better than constant
- Smooth solution is better than pointwise
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
::: {.panel-tabset}
## CDFs
```{r, echo = FALSE, fig.width=7, fig.height=6, fig.align='center', cache = TRUE}
source("assets/01_common.R")
load("assets/crps_learning/01_motivation_01.RData")
ggplot(df, aes(x = x, y = y, xend = xend, yend = yend)) +
stat_function(
fun = pnorm, n = 10000,
args = list(mean = dev[2], sd = experts_sd[2]),
aes(col = "Expert 2"), size = 1.5
) +
stat_function(
fun = pnorm, n = 10000,
args = list(mean = dev[1], sd = experts_sd[1]),
aes(col = "Expert 1"), size = 1.5
) +
stat_function(
fun = pnorm,
n = 10000,
size = 1.5, aes(col = "DGP") # , linetype = "dashed"
) +
geom_point(aes(col = "ECDF"), size = 1.5, show.legend = FALSE) +
geom_segment(aes(col = "ECDF")) +
geom_segment(data = tibble(
x_ = -5,
xend_ = min(y),
y_ = 0,
yend_ = 0
), aes(x = x_, xend = xend_, y = y_, yend = yend_)) +
theme_minimal() +
theme(
text = element_text(size = text_size),
legend.position = "bottom",
legend.key.width = unit(1.5, "cm")
) +
ylab("Probability p") +
xlab("Value") +
scale_colour_manual(NULL, values = c("#969696", "#252525", col_auto, col_blue)) +
guides(color = guide_legend(
nrow = 2,
byrow = FALSE # ,
# override.aes = list(
# size = c(1.5, 1.5, 1.5, 1.5)
# )
)) +
scale_x_continuous(limits = c(-5, 7.5))
```
## Weights
```{r, echo = FALSE, fig.width=7, fig.height=6, fig.align='center', cache = TRUE}
source("assets/01_common.R")
load("assets/crps_learning/01_motivation_02.RData")
ggplot() +
geom_line(data = weights[weights$var != "1Optimum", ], size = 1.5, aes(x = prob, y = val, col = var)) +
geom_line(
data = weights[weights$var == "1Optimum", ], size = 1.5, aes(x = prob, y = val, col = var) # , linetype = "dashed"
) +
theme_minimal() +
theme(
text = element_text(size = text_size),
legend.position = "bottom",
legend.key.width = unit(1.5, "cm")
) +
xlab("Probability p") +
ylab("Weight w") +
scale_colour_manual(
NULL,
values = c("#969696", col_pointwise, col_p_constant, col_p_smooth),
labels = modnames[-c(3, 5)]
) +
guides(color = guide_legend(
ncol = 3,
byrow = FALSE,
title.hjust = 5,
# override.aes = list(
# linetype = c(rep("solid", 5), "dashed")
# )
)) +
ylim(c(0, 1))
```
::::
:::
:::
## The Smoothing Procedures
::: {.panel-tabset}
## Penalized Smoothing
:::: {.columns}
::: {.column width="48%"}
Penalized cubic B-Splines for smoothing weights:
Let $\varphi=(\varphi_1,\ldots, \varphi_L)$ be bounded basis functions on $(0,1)$ Then we approximate $w_{t,k}$ by
\begin{align}
w_{t,k}^{\text{smooth}} = \sum_{l=1}^L \beta_l \varphi_l = \beta'\varphi
\end{align}
with parameter vector $\beta$. The latter is estimated to penalize $L_2$-smoothing which minimizes
\begin{equation}
\| w_{t,k} - \beta' \varphi \|^2_2 + \lambda \| \mathcal{D}^{d} (\beta' \varphi) \|^2_2
\label{eq_function_smooth}
\end{equation}
with differential operator $\mathcal{D}$
Computation is easy, since we have an analytical solution
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
We receive the constant solution for high values of $\lambda$ when setting $d=1$
<center>
<img src="assets/crps_learning/weights_lambda.gif">
</center>
:::
::::
## Basis Smoothing
:::: {.columns}
::: {.column width="48%"}
Represent weights as linear combinations of bounded basis functions:
\begin{equation}
w_{t,k} = \sum_{l=1}^L \beta_{t,k,l} \varphi_l = \boldsymbol \beta_{t,k}' \boldsymbol \varphi
\end{equation}
A popular choice are are B-Splines as local basis functions
$\boldsymbol \beta_{t,k}$ is calculated using a reduced regret matrix:
\begin{equation}
\underbrace{\boldsymbol r_{t}}_{\text{LxK}} = \frac{L}{P} \underbrace{\boldsymbol B'}_{\text{LxP}} \underbrace{\left({\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widetilde{\boldsymbol X}_{t},Y_t)- {\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widehat{\boldsymbol X}_{t},Y_t)\right)}_{\text{PxK}}
\end{equation}
`r fontawesome::fa("arrow-right", fill ="#000000")` $\boldsymbol r_{t}$ is transformed from PxK to LxK
If $L = P$ it holds that $\boldsymbol \varphi = \boldsymbol{I}$
For $L = 1$ we receive constant weights
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
Weights converge to the constant solution if $L\rightarrow 1$
<center>
<img src="/assets/crps_learning/weights_kstep.gif">
</center>
:::
::::
::::
---
## The Proposed CRPS-Learning Algorithm
<br/>
::: {style="font-size: 85%;"}
:::: {.columns}
::: {.column width="43%"}
### Initialization:
Array of expert predicitons: $\widehat{X}_{t,p,k}$
Vector of Prediction targets: $Y_t$
Starting Weights: $\boldsymbol w_0=(w_{0,1},\ldots, w_{0,K})$
Penalization parameter: $\lambda\geq 0$
B-spline and penalty matrices $\boldsymbol B$ and $\boldsymbol D$ on $\mathcal{P}= (p_1,\ldots,p_M)$
Hat matrix: $$\boldsymbol{\mathcal{H}} = \boldsymbol B(\boldsymbol B'\boldsymbol B+ \lambda (\alpha \boldsymbol D_1'\boldsymbol D_1 + (1-\alpha) \boldsymbol D_2'\boldsymbol D_2))^{-1} \boldsymbol B'$$
Cumulative Regret: $R_{0,k} = 0$
Range parameter: $E_{0,k}=0$
Starting pseudo-weights: $\boldsymbol \beta_0 = \boldsymbol B^{\text{pinv}}\boldsymbol w_0(\boldsymbol{\mathcal{P}})$
:::
::: {.column width="2%"}
:::
::: {.column width="55%"}
### Core:
for( t in 1:T ) {
&nbsp;&nbsp;&nbsp;&nbsp;$\widetilde{\boldsymbol X}_{t} = \text{Sort}\left( \boldsymbol w_{t-1}'(\boldsymbol P) \widehat{\boldsymbol X}_{t} \right)$ <b style="color: var(--col_grey_7);"># Prediction</b>
&nbsp;&nbsp;&nbsp;&nbsp;$\boldsymbol r_{t} = \frac{L}{M} \boldsymbol B' \left({\boldsymbol{QL}}_{\boldsymbol{\mathcal P}}^{\nabla}(\widetilde{\boldsymbol X}_{t},Y_t)- {\boldsymbol{QL}}_{\boldsymbol{\mathcal P}}^{\nabla}(\widehat{\boldsymbol X}_{t},Y_t)\right)$
&nbsp;&nbsp;&nbsp;&nbsp;$\boldsymbol E_{t} = \max(\boldsymbol E_{t-1}, \boldsymbol r_{t}^+ + \boldsymbol r_{t}^-)$
&nbsp;&nbsp;&nbsp;&nbsp;$\boldsymbol V_{t} = \boldsymbol V_{t-1} + \boldsymbol r_{t}^{ \odot 2}$
&nbsp;&nbsp;&nbsp;&nbsp;$\boldsymbol \eta_{t} =\min\left( \left(-\log(\boldsymbol \beta_{0}) \odot \boldsymbol V_{t}^{\odot -1} \right)^{\odot\frac{1}{2}} , \frac{1}{2}\boldsymbol E_{t}^{\odot-1}\right)$
&nbsp;&nbsp;&nbsp;&nbsp;$\boldsymbol R_{t} = \boldsymbol R_{t-1}+ \boldsymbol r_{t} \odot \left( \boldsymbol 1 - \boldsymbol \eta_{t} \odot \boldsymbol r_{t} \right)/2 + \boldsymbol E_{t} \odot \mathbb{1}\{-2\boldsymbol \eta_{t}\odot \boldsymbol r_{t} > 1\}$
&nbsp;&nbsp;&nbsp;&nbsp;$\boldsymbol \beta_{t} = K \boldsymbol \beta_{0} \odot \boldsymbol {SoftMax}\left( - \boldsymbol \eta_{t} \odot \boldsymbol R_{t} + \log( \boldsymbol \eta_{t}) \right)$
&nbsp;&nbsp;&nbsp;&nbsp;$\boldsymbol w_{t}(\boldsymbol P) = \underbrace{\boldsymbol B(\boldsymbol B'\boldsymbol B+ \lambda (\alpha \boldsymbol D_1'\boldsymbol D_1 + (1-\alpha) \boldsymbol D_2'\boldsymbol D_2))^{-1} \boldsymbol B'}_{\boldsymbol{\mathcal{H}}} \boldsymbol B \boldsymbol \beta_{t}$
}
:::
::::
:::
## Simulation Study
::: {.panel-tabset}
## BOA
:::: {.columns}
::: {.column width="48%"}
Data Generating Process of the [simple probabilistic example](#simple_example):
\begin{align*}
Y_t &\sim \mathcal{N}(0,\,1)\\
\widehat{X}_{t,1} &\sim \widehat{F}_{1}=\mathcal{N}(-1,\,1) \\
\widehat{X}_{t,2} &\sim \widehat{F}_{2}=\mathcal{N}(3,\,4)
\end{align*}
- Constant solution $\lambda \rightarrow \infty$
- Pointwise Solution of the proposed BOAG
- Smoothed Solution of the proposed BOAG
- Weights are smoothed during learning
- Smooth weights are used to calculate Regret, adjust weights, etc.
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
::: {.panel-tabset}
## QL Deviation
Deviation from best attainable QL (1000 runs).
![](assets/crps_learning/pre_vs_post.gif)
## CRPS vs. Lambda
CRPS Values for different $\lambda$ (1000 runs)
![](assets/crps_learning/pre_vs_post_lambda.gif)
## Knots
CRPS for different number of knots (1000 runs)
![](assets/crps_learning/pre_vs_post_kstep.gif)
::::
:::
::::
## Comparison to EWA and ML-Poly
The same simulation carried out for different algorithms (1000 runs):
<center>
<img src="assets/crps_learning/algos_constant.gif">
</center>
## Study Forget
:::: {.columns}
::: {.column width="38%"}
#### New DGP:
\begin{align*}
Y_t &\sim \mathcal{N}\left(\frac{\sin(0.005 \pi t )}{2},\,1\right) \\
\widehat{X}_{t,1} &\sim \widehat{F}_{1} = \mathcal{N}(-1,\,1) \\
\widehat{X}_{t,2} &\sim \widehat{F}_{2} = \mathcal{N}(3,\,4)
\end{align*}
`r fontawesome::fa("arrow-right", fill ="#000000")` Changing optimal weights
`r fontawesome::fa("arrow-right", fill ="#000000")` Single run example depicted aside
`r fontawesome::fa("arrow-right", fill ="#000000")` No forgetting leads to long-term constant weights
<center>
<img src="assets/crps_learning/forget.png">
</center>
:::
::: {.column width="4%"}
:::
::: {.column width="58%"}
### &nbsp;
```{r, echo = FALSE, fig.width=7, fig.height=5, fig.align='center', cache = TRUE}
load("assets/crps_learning/weights_preprocessed.rda")
mod_labs <- c("Optimum", "No Forget\nPointwise", "No Forget\nP-Smooth", "Forget\nPointwise", "Forget\nP-Smooth")
names(mod_labs) <- c("Optimum", "nf_ptw", "nf_psmth", "f_ptw", "f_psmth")
colseq <- c(grey(.99), "orange", "red", "purple", "blue", "darkblue", "black")
weights_preprocessed %>%
mutate(w = 1 - w) %>%
ggplot(aes(t, p, fill = w)) +
geom_raster(interpolate = TRUE) +
facet_grid(Mod ~ ., labeller = labeller(Mod = mod_labs)) +
theme_minimal() +
theme(
# plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm"),
text = element_text(size = 15),
legend.key.height = unit(1, "inch")
) +
scale_x_continuous(expand = c(0, 0)) +
xlab("Time t") +
scale_fill_gradientn(
limits = c(0, 1),
colours = colseq,
breaks = seq(0, 1, 0.2)
) +
ylab("Weight w")
```
:::
::::
::::
## Possible Extensions
**Forgetting**
- Only taking part of the old cumulative regret into account
- Exponential forgetting of past regret
\begin{align*}
R_{t,k} & = R_{t-1,k}(1-\xi) + \ell(\widetilde{F}_{t},Y_i) - \ell(\widehat{F}_{t,k},Y_i) \label{eq_regret_forget}
\end{align*}
**Fixed Shares** @herbster1998tracking
- Adding fixed shares to the weights
- Shrinkage towards a constant solution
\begin{align*}
\widetilde{w}_{t,k} = \rho \frac{1}{K} + (1-\rho) w_{t,k}
\label{fixed_share_simple}.
\end{align*}
TODO: Move these to the multivariate slides
## Application Study
::: {.panel-tabset}
## Overview
:::: {.columns}
::: {.column width="29%"}
::: {style="font-size: 85%;"}
Data:
- Forecasting European emission allowances (EUA)
- Daily month-ahead prices
- Jan 13 - Dec 20 (Phase III, 2092 Obs)
Combination methods:
- Naive, BOAG, EWAG, ML-PolyG, BMA
Tuning paramter grids:
- Smoothing Penalty: $\Lambda= \{0\}\cup \{2^x|x\in \{-4,-3.5,\ldots,12\}\}$
- Learning Rates: $\mathcal{E}= \{2^x|x\in \{-1,-0.5,\ldots,9\}\}$
::::
:::
::: {.column width="2%"}
:::
::: {.column width="69%"}
```{r, echo = FALSE, fig.width=7, fig.height=5, fig.align='center', cache = TRUE}
load("assets/crps_learning/overview_data.rds")
data %>%
ggplot(aes(x = Date, y = value)) +
geom_line(size = 1, col = col_blue) +
theme_minimal() +
ylab("Value") +
facet_wrap(. ~ name, scales = "free", ncol = 1) +
theme(
text = element_text(size = 15),
strip.background = element_blank(),
strip.text.x = element_blank()
) -> p1
data %>%
ggplot(aes(x = value)) +
geom_histogram(aes(y = ..density..), size = 1, fill = col_blue, bins = 50) +
ylab("Density") +
xlab("Value") +
theme_minimal() +
theme(
strip.background = element_rect(fill = col_lightgray, colour = col_lightgray),
text = element_text(size = 15)
) +
facet_wrap(. ~ name, scales = "free", ncol = 1, strip.position = "right") -> p2
overview <- cowplot::plot_grid(plotlist = list(p1, p2), align = "hv", axis = "tblr", rel_widths = c(0.65, 0.35))
overview
```
:::
::::
## Experts
::: {style="font-size: 90%;"}
Simple exponential smoothing with additive errors (**ETS-ANN**):
\begin{align*}
Y_{t} = l_{t-1} + \varepsilon_t \quad \text{with} \quad l_t = l_{t-1} + \alpha \varepsilon_t \quad \text{and} \quad \varepsilon_t \sim \mathcal{N}(0,\sigma^2)
\end{align*}
Quantile regression (**QuantReg**): For each $p \in \mathcal{P}$ we assume:
\begin{align*}
F^{-1}_{Y_t}(p) = \beta_{p,0} + \beta_{p,1} Y_{t-1} + \beta_{p,2} |Y_{t-1}-Y_{t-2}|
\end{align*}
ARIMA(1,0,1)-GARCH(1,1) with Gaussian errors (**ARMA-GARCH**):
\begin{align*}
Y_{t} = \mu + \phi(Y_{t-1}-\mu) + \theta \varepsilon_{t-1} + \varepsilon_t \quad \text{with} \quad \varepsilon_t = \sigma_t Z, \quad \sigma_t^2 = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2 \quad \text{and} \quad Z_t \sim \mathcal{N}(0,1)
\end{align*}
ARIMA(0,1,0)-I-EGARCH(1,1) with Gaussian errors (**I-EGARCH**):
\begin{align*}
Y_{t} = \mu + Y_{t-1} + \varepsilon_t \quad \text{with} \quad \varepsilon_t = \sigma_t Z, \quad \log(\sigma_t^2) = \omega + \alpha Z_{t-1}+ \gamma (|Z_{t-1}|-\mathbb{E}|Z_{t-1}|) + \beta \log(\sigma_{t-1}^2) \quad \text{and} \quad Z_t \sim \mathcal{N}(0,1)
\end{align*}
ARIMA(0,1,0)-GARCH(1,1) with student-t errors (**I-GARCHt**):
\begin{align*}
Y_{t} = \mu + Y_{t-1} + \varepsilon_t \quad \text{with} \quad \varepsilon_t = \sigma_t Z, \quad \sigma_t^2 = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2 \quad \text{and} \quad Z_t \sim t(0,1, \nu)
\end{align*}
::::
## Results
::: {.panel-tabset}
## Significance
<br/>
```{r, echo = FALSE, fig.width=7, fig.height=5.5, fig.align='center', cache = TRUE, results='asis'}
load("assets/crps_learning/bernstein_application_study_estimations+learnings_rev1.RData")
quantile_loss <- function(X, y, tau) {
t(t(y - X) * tau) * (y - X > 0) + t(t(X - y) * (1 - tau)) * (y - X < 0)
}
QL <- FCSTN * NA
for (k in 1:dim(QL)[1]) {
QL[k, , ] <- quantile_loss(FCSTN[k, , ], as.numeric(yoos), Qgrid)
}
## TABLE AREA
KK <- length(mnames)
TTinit <- 1 ## without first, as all comb. are uniform
RQL <- apply(QL[1:KK, -c(1:TTinit), ], c(1, 3), mean)
dimnames(RQL) <- list(mnames, Qgrid)
RQLm <- apply(RQL, c(1), mean, na.rm = TRUE)
# sort(RQLm - RQLm[K + 1])
##
qq <- apply(QL[1:KK, -c(1:TTinit), ], c(1, 2), mean)
# t.test(qq[K + 1, ] - qq[K + 3, ])
# t.test(qq[K + 1, ] - qq[K + 4, ])
library(xtable)
Pall <- numeric(KK)
for (i in 1:KK) Pall[i] <- t.test(qq[K + 1, ] - qq[i, ], alternative = "greater")$p.val
Mall <- (RQLm - RQLm[K + 1]) * 10000
Mout <- matrix(Mall[-c(1:(K + 3))], 5, 6)
dimnames(Mout) <- list(moname, mtname)
Pallout <- format(round(Pall, 3), nsmall = 3)
Pallout[Pallout == "0.000"] <- "<.001"
Pallout[Pallout == "1.000"] <- ">.999"
MO <- K
IDX <- c(1:K)
OUT <- t(Mall[IDX])
OUT.num <- OUT
class(OUT.num) <- "numeric"
xxx <- OUT.num
xxxx <- OUT
table <- OUT
table_col <- OUT
i.p <- 1
for (i.p in 1:MO) {
xmax <- -min(Mall) * 5 # max(Mall)
xmin <- min(Mall)
cred <- rev(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .8, .5)) # , .5,0,0,0,1,1,1) ## red
cgreen <- rev(c(.5, .5, .55, .6, .65, .7, .75, .8, .85, .9, .95, 1, 1, .9)) # , .5,0,1,1,1,0,0) ## green
cblue <- rev(c(.55, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5)) # , .5,1,1,0,0,0,1) ## blue
crange <- c(xmin, xmax) ## range
## colors in plot:
fred <- round(approxfun(seq(crange[1], crange[2], length = length(cred)), cred)(pmin(xxx[, i.p], xmax)), 3)
fgreen <- round(approxfun(seq(crange[1], crange[2], length = length(cgreen)), cgreen)(pmin(xxx[, i.p], xmax)), 3)
fblue <- round(approxfun(seq(crange[1], crange[2], length = length(cblue)), cblue)(pmin(xxx[, i.p], xmax)), 3)
tmp <- format(round(xxx[, i.p], 3), nsmall = 3)
xxxx[, i.p] <- paste("\\cellcolor[rgb]{", fred, ",", fgreen, ",", fblue, "}", tmp, " {\\footnotesize (", Pallout[IDX[i.p]], ")}", sep = "")
table[, i.p] <- paste0(tmp, " (", Pallout[i.p], ")")
table_col[, i.p] <- rgb(fred, fgreen, fblue, maxColorValue = 1)
} # i.p
table_out <- kbl(table, align = rep("c", ncol(table)), bootstrap_options = c("condensed")) %>%
kable_paper(full_width = TRUE)
for (j in 1:ncol(table)) {
table_out <- table_out %>%
column_spec(j, background = table_col[, j])
}
table_out
```
```{r, echo = FALSE, fig.width=7, fig.height=5.5, fig.align='center', cache = TRUE, results='asis'}
MO <- 6
OUT <- Mout
OUT.num <- OUT
class(OUT.num) <- "numeric"
xxx <- OUT.num
xxxx <- OUT
i.p <- 1
table2 <- OUT
table_col2 <- OUT
for (i.p in 1:MO) {
xmax <- -min(Mall) * 5 # max(Mall)
xmin <- min(Mall)
cred <- rev(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .8, .5)) # , .5,0,0,0,1,1,1) ## red
cgreen <- rev(c(.5, .5, .55, .6, .65, .7, .75, .8, .85, .9, .95, 1, 1, .9)) # , .5,0,1,1,1,0,0) ## green
cblue <- rev(c(.55, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5)) # , .5,1,1,0,0,0,1) ## blue
crange <- c(xmin, xmax) ## range
## colors in plot:
fred <- round(approxfun(seq(crange[1], crange[2], length = length(cred)), cred)(pmin(xxx[, i.p], xmax)), 3)
fgreen <- round(approxfun(seq(crange[1], crange[2], length = length(cgreen)), cgreen)(pmin(xxx[, i.p], xmax)), 3)
fblue <- round(approxfun(seq(crange[1], crange[2], length = length(cblue)), cblue)(pmin(xxx[, i.p], xmax)), 3)
tmp <- format(round(xxx[, i.p], 3), nsmall = 3)
xxxx[, i.p] <- paste("\\cellcolor[rgb]{", fred, ",", fgreen, ",", fblue, "}", tmp, " {\\footnotesize (", Pallout[K + 3 + 5 * (i.p - 1) + 1:5], ")}", sep = "")
table2[, i.p] <- paste0(tmp, " (", Pallout[K + 3 + 5 * (i.p - 1) + 1:5], ")")
table_col2[, i.p] <- rgb(fred, fgreen, fblue, maxColorValue = 1)
} # i.p
table_out2 <- kableExtra::kbl(table2, align = rep("c", ncol(table2)), bootstrap_options = c("condensed")) %>%
kable_paper(full_width = TRUE)
for (j in 1:ncol(table2)) {
table_out2 <- table_out2 %>%
column_spec(1 + j,
background = table_col2[, j]
)
}
table_out2 %>%
column_spec(1, bold = T)
```
## QL
```{r, echo = FALSE, fig.width=13, fig.height=5.5, fig.align='center', cache = TRUE}
##### Performance across probabilities
M <- length(mnames)
Msel <- c(1:K, K + 1, K + 1 + 2 + 1:4 * 5 - 2) ## experts + naive + smooth
modnames <- mnames[Msel]
tCOL <- c(
"#E6CC00", "#CC6600", "#E61A1A", "#99004D", "#F233BF",
"#666666", "#0000CC", "#1A80E6", "#1AE680", "#00CC00"
)
t(RQL) %>%
as_tibble() %>%
select(Naive) %>%
mutate(Naive = 0) %>%
mutate(p = 1:99 / 100) %>%
pivot_longer(-p, values_to = "Loss differences") -> dummy
t(RQL) %>%
as_tibble() %>%
select(mnames[Msel]) %>%
mutate(p = 1:99 / 100) %>%
pivot_longer(!p & !Naive) %>%
mutate(`Loss differences` = value - Naive) %>%
select(-value, -Naive) %>%
rbind(dummy) %>%
mutate(
p = as.numeric(p),
name = stringr::str_replace(name, "-P-smooth", ""),
name = factor(name, levels = stringr::str_replace(mnames[Msel], "-P-smooth", ""), ordered = T),
`Loss differences` = `Loss differences` * 1000
) %>%
ggplot(aes(x = p, y = `Loss differences`, colour = name)) +
geom_line(linewidth = 1) +
theme_minimal() +
theme(
text = element_text(size = text_size),
legend.position = "bottom"
) +
xlab("Probability p") +
scale_color_manual(NULL, values = tCOL) +
guides(colour = guide_legend(nrow = 2, byrow = TRUE))
```
## Cumulative Loss Difference
```{r, echo = FALSE, fig.width=13, fig.height=5.5, fig.align='center', cache = TRUE}
DQL <- t(apply(apply(QL[1:KK, -c(1:TTinit), ], c(1, 2), mean), 1, cumsum))
rownames(DQL) <- mnames
t(DQL) %>%
as_tibble() %>%
select(Naive) %>%
mutate(
`Difference of cumulative loss` = 0,
Date = ytime[-c(1:(TT + TTinit + 1))],
name = "Naive"
) %>%
select(-Naive) -> dummy
data <- t(DQL) %>%
as_tibble() %>%
select(mnames[Msel]) %>%
mutate(Date = ytime[-c(1:(TT + TTinit + 1))]) %>%
pivot_longer(!Date & !Naive) %>%
mutate(`Difference of cumulative loss` = value - Naive) %>%
select(-value, -Naive) %>%
rbind(dummy) %>%
mutate(
name = stringr::str_replace(name, "-P-smooth", ""),
name = factor(name, levels = stringr::str_replace(mnames[Msel], "-P-smooth", ""))
)
data %>%
ggplot(aes(x = Date, y = `Difference of cumulative loss`, colour = name)) +
geom_line(size = 1) +
theme_minimal() +
theme(
text = element_text(size = text_size),
legend.position = "bottom"
) +
scale_color_manual(NULL, values = tCOL) +
guides(colour = guide_legend(nrow = 2, byrow = TRUE))
```
## Weights (BOAG P-Smooth)
```{r, echo = FALSE, fig.width=13, fig.height=5.5, fig.align='center', cache = TRUE}
load("assets/crps_learning/weights_data.RData")
weights_data %>%
ggplot(aes(Date, p, fill = w)) +
geom_raster(interpolate = TRUE) +
facet_grid(Mod ~ .) +
theme_minimal() +
theme(
plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"),
text = element_text(size = text_size),
legend.key.height = unit(0.9, "inch")
) +
ylab("p") +
scale_fill_gradientn(
limits = c(0, 1),
colours = colseq,
breaks = seq(0, 1, 0.2)
) +
scale_x_date(expand = c(0, 0))
```
## Weights (Last)
```{r, echo = FALSE, fig.width=13, fig.height=5.5, fig.align='center', cache = TRUE}
load("assets/crps_learning/weights_example.RData")
weights %>%
ggplot(aes(x = p, y = weights, col = Model)) +
geom_line(size = 1.5) +
theme_minimal() +
theme(
plot.margin = unit(c(0.2, 0.3, 0.2, 0.2), "cm"),
text = element_text(size = text_size),
legend.position = "bottom",
legend.title = element_blank(),
panel.spacing = unit(1.5, "lines")
) +
scale_color_manual(NULL, values = tCOL[1:K]) +
facet_grid(. ~ K)
```
::::
::::
# Multivariate Probabilistic CRPS Learning with an Application to Day-Ahead Electricity Prices
Berrisch, J., & Ziel, F. (2024). *International Journal of Forecasting*, 40(4), 1568-1586.
## Multivariate CRPS Learning
:::: {.columns}
::: {.column width="45%"}
We extend the **B-Smooth** and **P-Smooth** procedures to the multivariate setting:
::: {.panel-tabset}
## Penalized Smoothing
Let $\boldsymbol{\psi}^{\text{mv}}=(\psi_1,\ldots, \psi_{D})$ and $\boldsymbol{\psi}^{\text{pr}}=(\psi_1,\ldots, \psi_{P})$ be two sets of bounded basis functions on $(0,1)$:
\begin{equation*}
\boldsymbol w_{t,k} = \boldsymbol{\psi}^{\text{mv}} \boldsymbol{b}_{t,k} {\boldsymbol{\psi}^{pr}}'
\end{equation*}
with parameter matix $\boldsymbol b_{t,k}$. The latter is estimated to penalize $L_2$-smoothing which minimizes
\begin{align}
& \| \boldsymbol{\beta}_{t,d, k}' \boldsymbol{\varphi}^{\text{pr}} - \boldsymbol b_{t, d, k}' \boldsymbol{\psi}^{\text{pr}} \|^2_2 + \lambda^{\text{pr}} \| \mathcal{D}_{q} (\boldsymbol b_{t, d, k}' \boldsymbol{\psi}^{\text{pr}}) \|^2_2 + \nonumber \\
& \| \boldsymbol{\beta}_{t, p, k}' \boldsymbol{\varphi}^{\text{mv}} - \boldsymbol b_{t, p, k}' \boldsymbol{\psi}^{\text{mv}} \|^2_2 + \lambda^{\text{mv}} \| \mathcal{D}_{q} (\boldsymbol b_{t, p, k}' \boldsymbol{\psi}^{\text{mv}}) \|^2_2 \nonumber
\end{align}
with differential operator $\mathcal{D}_q$ of order $q$
[{{< fa calculator >}}]{style="color:var(--col_green_10);"} We have an analytical solution.
## Basis Smoothing
Linear combinations of bounded basis functions:
\begin{equation}
\underbrace{\boldsymbol w_{t,k}}_{D \text{ x } P} = \sum_{j=1}^{\widetilde D} \sum_{l=1}^{\widetilde P} \beta_{t,j,l,k} \varphi^{\text{mv}}_{j} \varphi^{\text{pr}}_{l} = \underbrace{\boldsymbol \varphi^{\text{mv}}}_{D\text{ x }\widetilde D} \boldsymbol \beta_{t,k} \underbrace{{\boldsymbol\varphi^{\text{pr}}}'}_{\widetilde P \text{ x }P} \nonumber
\end{equation}
A popular choice: B-Splines
$\boldsymbol \beta_{t,k}$ is calculated using a reduced regret matrix:
$\underbrace{\boldsymbol r_{t,k}}_{\widetilde P \times \widetilde D} = \boldsymbol \varphi^{\text{pr}} \underbrace{\left({\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widetilde{\boldsymbol X}_{t},Y_t)- {\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widehat{\boldsymbol X}_{t},Y_t)\right)}_{\text{PxD}}\boldsymbol \varphi^{\text{mv}}$
If $\widetilde P = P$ it holds that $\boldsymbol \varphi^{pr} = \boldsymbol{I}$ (pointwise)
For $\widetilde P = 1$ we receive constant weights
::::
:::
::: {.column width="2%"}
:::
::: {.column width="53%"}
```{r, fig.align="center", echo=FALSE, out.width = "1000px", cache = TRUE}
knitr::include_graphics("assets/mcrps_learning/algorithm.svg")
```
:::
::::
## Application
:::: {.columns}
::: {.column width="48%"}
#### Data
- Day-Ahead electricity price forecasts from @marcjasz2022distributional
- Produced using probabilistic neural networks
- 24-dimensional distributional forecasts
- Distribution assumptions: JSU and Normal
- 8 experts (4 JSU, 4 Normal)
- 27th Dec. 2018 to 31st Dec. 2020 (736 days)
- We extract 99 quantiles (percentiles)
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
#### Setup
Evaluation: Exclude first 182 observations
Extensions: Penalized smoothing | Forgetting
Tuning strategies:
- Bayesian Fix
- Sophisticated Baesian Search algorithm
- Online
- Dynamic based on past performance
- Bayesian Online
- First Bayesian Fix then Online
Computation Time: ~30 Minutes
:::
::::
## Results
:::: {.columns}
::: {.column width="55%"}
```{r, cache = TRUE}
load("assets/mcrps_learning/naive_table_df.rds")
table_naive <- naive_table_df %>%
as_tibble() %>%
head(1) %>%
mutate_all(round, 4) %>%
mutate_all(sprintf, fmt = "%#.3f") %>%
kbl(
bootstrap_options = "condensed",
escape = FALSE,
format = "html",
booktabs = FALSE,
align = c("c", rep("c", ncol(naive_table_df) - 1))
) %>%
kable_paper(full_width = TRUE) %>%
row_spec(0:1, color = cols[10, "grey"]) %>%
kable_styling(font_size = 16)
for (i in 1:ncol(naive_table_df)) {
table_naive <- table_naive %>%
column_spec(i,
background = ifelse(
is.na(naive_table_df["stat", i, drop = TRUE][-ncol(naive_table_df)]),
cols[5, "grey"],
col_scale2(
naive_table_df["stat", i, drop = TRUE][-ncol(naive_table_df)],
rng_t
)
),
bold = i == which.min(naive_table_df["loss", ])
)
}
table_naive
load("assets/mcrps_learning/performance_data.rds")
i <- 1
j <- 1
for (j in 1:3) {
for (i in seq_len(nrow(performance_loss_tibble))) {
if (loss_and_dm[i, j, "p.val"] < 0.001) {
performance_loss_tibble[i, 2 + j] <- paste0(
performance_loss_tibble[i, 2 + j],
'<span class="sup-zero-width">***</span>'
)
} else if (loss_and_dm[i, j, "p.val"] < 0.01) {
performance_loss_tibble[i, 2 + j] <- paste0(
performance_loss_tibble[i, 2 + j],
'<span class="sup-zero-width">**</span>'
)
} else if (loss_and_dm[i, j, "p.val"] < 0.05) {
performance_loss_tibble[i, 2 + j] <- paste0(
performance_loss_tibble[i, 2 + j],
'<span class="sup-zero-width">*</span>'
)
} else if (loss_and_dm[i, j, "p.val"] < 0.1) {
performance_loss_tibble[i, 2 + j] <- paste0(
performance_loss_tibble[i, 2 + j],
'<span class="zero-width">.</span>'
)
} else {
performance_loss_tibble[i, 2 + j] <- paste0(
performance_loss_tibble[i, 2 + j]
)
}
}
}
table_performance <- performance_loss_tibble %>%
kbl(
padding=-1L,
col.names = c(
'Description',
'Parameter Tuning',
'BOA',
'ML-Poly',
'EWA'
),
bootstrap_options = "condensed",
# Dont replace any string, dataframe has to be valid latex code ...
escape = FALSE,
format = "html",
align = c("l", "l", rep("c", ncol(performance_loss_tibble)-2))
) %>%
kable_paper(full_width = TRUE) %>%
row_spec(0:nrow(performance_loss_tibble), color = cols[10, "grey"])
# %%
for (i in 3:ncol(performance_loss_tibble)) {
bold_cells <- rep(FALSE, times = nrow(performance_loss_tibble))
loss <- loss_and_dm[, i - 2, "loss"]
table_performance <- table_performance %>%
column_spec(i,
background = c(
col_scale2(
loss_and_dm[, i - 2, "stat"],
rng_t
)
),
bold = loss == min(loss),
)
}
table_performance %>%
kable_styling(font_size = 16)
```
```{=html}
<div style="font-size: 0.6em; margin-top: 0.5em;">
<span style="padding: 2px 6px;">Coloring w.r.t. test statistic: </span>
<span style="background-color: #66BA6A; padding: 2px 6px;">&lt;-5</span>
<span style="background-color: #7CC168; padding: 2px 6px;">-4</span>
<span style="background-color: #91C866; padding: 2px 6px;">-3</span>
<span style="background-color: #B0D363; padding: 2px 6px;">-2</span>
<span style="background-color: #D8E05E; padding: 2px 6px;">-1</span>
<span style="background-color: #FFED58; padding: 2px 6px;">0</span>
<span style="background-color: #FFD145; padding: 2px 6px;">1</span>
<span style="background-color: #FFB531; padding: 2px 6px;">2</span>
<span style="background-color: #FC9733; padding: 2px 6px;">3</span>
<span style="background-color: #F67744; padding: 2px 6px;">4</span>
<span style="background-color: #EE5250; padding: 2px 6px;">&gt;5</span>
</div>
<div style="font-size: 0.7em;">
<span style="padding: 2px 6px;">Significance denoted by: </span><span>.</span> p &lt; 0.1; <span>*</span> p &lt; 0.05; <span>**</span> p &lt; 0.01; <span>***</span> p &lt; 0.001;
</div>
```
:::
::: {.column width = "45%"}
<br/>
::: {.panel-tabset}
## Constant
```{r, fig.align="center", echo=FALSE, out.width = "400", cache = TRUE}
knitr::include_graphics("assets/mcrps_learning/constant.svg")
```
## Pointwise
```{r, fig.align="center", echo=FALSE, out.width = "400", cache = TRUE}
knitr::include_graphics("assets/mcrps_learning/pointwise.svg")
```
## B Constant PR
```{r, fig.align="center", echo=FALSE, out.width = "400", cache = TRUE}
knitr::include_graphics("assets/mcrps_learning/constant_pr.svg")
```
## B Constant MV
```{r, fig.align="center", echo=FALSE, out.width = "400", cache = TRUE}
knitr::include_graphics("assets/mcrps_learning/constant_mv.svg")
```
## Smooth.Forget
```{r, fig.align="center", echo=FALSE, out.width = "400", cache = TRUE}
knitr::include_graphics("assets/mcrps_learning/smooth_best.svg")
```
::::
:::
::::
## Results
::: {.panel-tabset}
## Chosen Parameters
```{r, warning=FALSE, fig.align="center", echo=FALSE, fig.width=12, fig.height=5.5, cache = TRUE}
load("assets/mcrps_learning/pars_data.rds")
pars_data %>%
ggplot(aes(x = dates, y = value)) +
geom_rect(aes(
ymin = 0,
ymax = value * 1.2,
xmin = dates[1],
xmax = dates[182],
fill = "Burn-In"
)) +
geom_line(aes(color = name), linewidth = linesize, show.legend = FALSE) +
scale_colour_manual(
values = as.character(cols[5, c("pink", "amber", "green")])
) +
facet_grid(name ~ .,
scales = "free_y",
# switch = "both"
) +
scale_y_continuous(
trans = "log2",
labels = scaleFUN
) +
theme_minimal() +
theme(
# plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"),
text = element_text(size = text_size),
legend.key.width = unit(0.9, "inch"),
legend.position = "none"
) +
ylab(NULL) +
xlab("date") +
scale_fill_manual(NULL,
values = as.character(cols[3, "grey"])
)
```
## Weights: Hour 16:00-17:00
```{r, fig.align="center", echo=FALSE, fig.width=12, fig.height=5.5, cache = TRUE}
load("assets/mcrps_learning/weights_h.rds")
weights_h %>%
ggplot(aes(date, q, fill = weight)) +
geom_raster(interpolate = TRUE) +
facet_grid(
Expert ~ . # , labeller = labeller(Mod = mod_labs)
) +
theme_minimal() +
theme(
# plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"),
text = element_text(size = text_size),
legend.key.height = unit(0.9, "inch")
) +
scale_x_date(expand = c(0, 0)) +
scale_fill_gradientn(
oob = scales::squish,
limits = c(0, 1),
values = c(seq(0, 0.4, length.out = 8), 0.65, 1),
colours = c(
cols[8, "red"],
cols[5, "deep-orange"],
cols[5, "amber"],
cols[5, "yellow"],
cols[5, "lime"],
cols[5, "light-green"],
cols[5, "green"],
cols[7, "green"],
cols[9, "green"],
cols[10, "green"]
),
breaks = seq(0, 1, 0.1)
) +
xlab("date") +
ylab("probability") +
scale_y_continuous(breaks = c(0.1, 0.5, 0.9))
```
## Weights: Median
```{r, fig.align="center", echo=FALSE, fig.width=12, fig.height=5.5, cache = TRUE}
load("assets/mcrps_learning/weights_q.rds")
weights_q %>%
mutate(hour = as.numeric(hour) - 1) %>%
ggplot(aes(date, hour, fill = weight)) +
geom_raster(interpolate = TRUE) +
facet_grid(
Expert ~ . # , labeller = labeller(Mod = mod_labs)
) +
theme_minimal() +
theme(
# plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"),
text = element_text(size = text_size),
legend.key.height = unit(0.9, "inch")
) +
scale_x_date(expand = c(0, 0)) +
scale_fill_gradientn(
oob = scales::squish,
limits = c(0, 1),
values = c(seq(0, 0.4, length.out = 8), 0.65, 1),
colours = c(
cols[8, "red"],
cols[5, "deep-orange"],
cols[5, "amber"],
cols[5, "yellow"],
cols[5, "lime"],
cols[5, "light-green"],
cols[5, "green"],
cols[7, "green"],
cols[9, "green"],
cols[10, "green"]
),
breaks = seq(0, 1, 0.1)
) +
xlab("date") +
ylab("hour") +
scale_y_continuous(breaks = c(0, 8, 16, 24))
```
::::
## Non-Equidistant Knots
::: {.panel-tabset}
## Knot Placement Illustration
```{ojs}
d3 = require("d3@7")
```
```{ojs}
bsplineData = FileAttachment("assets/mcrps_learning/basis_functions.csv").csv({ typed: true })
```
```{ojs}
function updateChartInner(g, x, y, linesGroup, color, line, data) {
// Update axes with transitions
x.domain([0, d3.max(data, d => d.x)]);
g.select(".x-axis").transition().duration(1500).call(d3.axisBottom(x).ticks(10));
y.domain([0, d3.max(data, d => d.y)]);
g.select(".y-axis").transition().duration(1500).call(d3.axisLeft(y).ticks(5));
// Group data by basis function
const dataByFunction = Array.from(d3.group(data, d => d.b));
const keyFn = d => d[0];
// Update basis function lines
const u = linesGroup.selectAll("path").data(dataByFunction, keyFn);
u.join(
enter => enter.append("path").attr("fill","none").attr("stroke-width",3)
.attr("stroke", (_, i) => color(i)).attr("d", d => line(d[1].map(pt => ({x: pt.x, y: 0}))))
.style("opacity",0),
update => update,
exit => exit.transition().duration(1000).style("opacity",0).remove()
)
.transition().duration(1000)
.attr("d", d => line(d[1]))
.attr("stroke", (_, i) => color(i))
.style("opacity",1);
}
chart = {
// State variables for selected parameters
let selectedMu = 0.5;
let selectedSig = 1;
let selectedNonc = 0;
let selectedTailw = 1;
const filteredData = () => bsplineData.filter(d =>
Math.abs(selectedMu - d.mu) < 0.001 &&
d.sig === selectedSig &&
d.nonc === selectedNonc &&
d.tailw === selectedTailw
);
const container = d3.create("div")
.style("max-width", "none")
.style("width", "100%");;
const controlsContainer = container.append("div")
.style("display", "flex")
.style("gap", "20px");
// slider controls
const sliders = [
{ label: 'Mu', get: () => selectedMu, set: v => selectedMu = v, min: 0.1, max: 0.9, step: 0.2 },
{ label: 'Sigma', get: () => Math.log2(selectedSig), set: v => selectedSig = 2 ** v, min: -2, max: 2, step: 1 },
{ label: 'Noncentrality', get: () => selectedNonc, set: v => selectedNonc = v, min: -4, max: 4, step: 2 },
{ label: 'Tailweight', get: () => Math.log2(selectedTailw), set: v => selectedTailw = 2 ** v, min: -2, max: 2, step: 1 }
];
// Build slider controls with D3 data join
const sliderCont = controlsContainer.selectAll('div').data(sliders).join('div')
.style('display','flex').style('align-items','center').style('gap','10px')
.style('flex','1').style('min-width','0px');
sliderCont.append('label').text(d => d.label + ':').style('font-size','20px');
sliderCont.append('input')
.attr('type','range').attr('min', d => d.min).attr('max', d => d.max).attr('step', d => d.step)
.property('value', d => d.get())
.on('input', function(event, d) {
const val = +this.value; d.set(val);
d3.select(this.parentNode).select('span').text(d.label.match(/Sigma|Tailweight/) ? 2**val : val);
updateChart(filteredData());
})
.style('width', '100%');
sliderCont.append('span').text(d => (d.label.match(/Sigma|Tailweight/) ? d.get() : d.get()))
.style('font-size','20px');
// Add Reset button to clear all sliders to their defaults
controlsContainer.append('button')
.text('Reset')
.style('font-size', '20px')
.style('align-self', 'center')
.style('margin-left', 'auto')
.on('click', () => {
// reset state vars
selectedMu = 0.5;
selectedSig = 1;
selectedNonc = 0;
selectedTailw = 1;
// update input positions
sliderCont.selectAll('input').property('value', d => d.get());
// update displayed labels
sliderCont.selectAll('span')
.text(d => d.label.match(/Sigma|Tailweight/) ? (2**d.get()) : d.get());
// redraw chart
updateChart(filteredData());
});
// Build SVG
const width = 1200;
const height = 450;
const margin = {top: 40, right: 20, bottom: 40, left: 40};
const innerWidth = width - margin.left - margin.right;
const innerHeight = height - margin.top - margin.bottom;
// Set controls container width to match SVG plot width
controlsContainer.style("max-width", "none").style("width", "100%");
// Distribute each control evenly and make sliders full-width
controlsContainer.selectAll("div").style("flex", "1").style("min-width", "0px");
controlsContainer.selectAll("input").style("width", "100%").style("box-sizing", "border-box");
// Create scales
const x = d3.scaleLinear()
.domain([0, 1])
.range([0, innerWidth]);
const y = d3.scaleLinear()
.domain([0, 1])
.range([innerHeight, 0]);
// Create a color scale for the basis functions
const color = d3.scaleOrdinal(d3.schemeCategory10);
// Create SVG
const svg = d3.create("svg")
.attr("width", "100%")
.attr("height", "auto")
.attr("viewBox", [0, 0, width, height])
.attr("preserveAspectRatio", "xMidYMid meet")
.attr("style", "max-width: 100%; height: auto;");
// Create the chart group
const g = svg.append("g")
.attr("transform", `translate(${margin.left},${margin.top})`);
// Add axes
const xAxis = g.append("g")
.attr("transform", `translate(0,${innerHeight})`)
.attr("class", "x-axis")
.call(d3.axisBottom(x).ticks(10))
.style("font-size", "20px");
const yAxis = g.append("g")
.attr("class", "y-axis")
.call(d3.axisLeft(y).ticks(5))
.style("font-size", "20px");
// Add a horizontal line at y = 0
g.append("line")
.attr("x1", 0)
.attr("x2", innerWidth)
.attr("y1", y(0))
.attr("y2", y(0))
.attr("stroke", "#000")
.attr("stroke-opacity", 0.2);
// Add gridlines
g.append("g")
.attr("class", "grid-lines")
.selectAll("line")
.data(y.ticks(5))
.join("line")
.attr("x1", 0)
.attr("x2", innerWidth)
.attr("y1", d => y(d))
.attr("y2", d => y(d))
.attr("stroke", "#ccc")
.attr("stroke-opacity", 0.5);
// Create a line generator
const line = d3.line()
.x(d => x(d.x))
.y(d => y(d.y))
.curve(d3.curveBasis);
// Group to contain the basis function lines
const linesGroup = g.append("g")
.attr("class", "basis-functions");
// Store the current basis functions for transition
let currentBasisFunctions = new Map();
// Function to update the chart with new data
function updateChart(data) {
updateChartInner(g, x, y, linesGroup, color, line, data);
}
// Store the update function
svg.node().update = updateChart;
// Initial render
updateChart(filteredData());
container.node().appendChild(svg.node());
return container.node();
}
```
## Knot Placement Details
:::: {.columns}
::: {.column width="48%"}
Basis specification `b_smooth_pr` is internally passed to `make_basis_mats()`:
```{r, echo = TRUE, eval = FALSE, cache = TRUE}
mod <- online(
y = Y,
experts = experts,
tau = 1:99 / 100,
b_smooth_pr = list(
knots = 9,
mu = 0.3, # NEW
sigma = 1,
nonc = 0,
tailweight = 1,
deg = 3
)
)
```
Knots are distributed using the generalized beta distribution.
TODO: Add actual algorithm to backup slides
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
:::
::::
::::
## Wrap-Up
:::: {.columns}
::: {.column width="48%"}
[{{< fa triangle-exclamation >}}]{style="color:var(--col_red_9);"} Potential Downsides:
- Pointwise optimization can induce quantile crossing
- Can be solved by sorting the predictions
[{{< fa magnifying-glass >}}]{style="color:var(--col_orange_9);"} Important:
- The choice of the learning rate is crucial
- The loss function has to meet certain criteria
[{{< fa rocket >}}]{style="color:var(--col_green_9);"} Upsides:
- Pointwise learning outperforms the Naive solution significantly
- Online learning is much faster than batch methods
- Smoothing further improves the predictive performance
- Asymptotically not worse than the best convex combination
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
The [`r fontawesome::fa("github")` profoc](https://profoc.berrisch.biz/) R Package:
- Implements all algorithms discussed above
- Is written using RcppArmadillo `r fontawesome::fa("arrow-right", fill ="#000000")` its fast
- Accepts vectors for most parameters
- The best parameter combination is chosen online
- Implements
- Forgetting, Fixed Share
- Different loss functions + gradients
Pubications:
[{{< fa newspaper >}}]{style="color:var(--col_grey_7);"} Berrisch, J., & Ziel, F. [-@BERRISCH2023105221]. CRPS learning. *Journal of Econometrics*, 237(2), 105221.
[{{< fa newspaper >}}]{style="color:var(--col_grey_7);"} Berrisch, J., & Ziel, F. [-@BERRISCH20241568]. Multivariate probabilistic CRPS learning with an application to day-ahead electricity prices. *International Journal of Forecasting*, 40(4), 1568-1586.
:::
::::
# Modeling Volatility and Dependence of European Carbon and Energy Prices
Berrisch, J., Pappert, S., Ziel, F., & Arsova, A. (2023). *Finance Research Letters*, 52, 103503.
---
## Motivation
:::: {.columns}
::: {.column width="48%"}
- Understanding European Allowances (EUA) dynamics is important
for several fields:
- - Portfolio & Risk Management,
- - Sustainability Planing
- - Political decisions
- - ...
EUA prices are obviously connected to the energy market
How can the dynamics be characterized?
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
</br>
- Several Questions arise:
- - Data (Pre)processing
- - Modeling Approach
- - Evaluation
:::
::::
## Data
- EUA, natural gas, Brent crude oil, coal
- March 15, 2010, until October 14, 2022
- Data was normalized w.r.t. $\text{CO}_2$ emissions
- Emission-adjusted prices reflects one tonne of $\text{CO}_2$
- We adjusted for inflation by Eurostat's HICP, excluding energy
- Log transformation of the data to stabilize the variance
- ADF Test: All series are stationary in first differences
- Johansens likelihood ratio trace test suggests two cointegrating relationships (levels)
- Johansens likelihood ratio trace test suggests no cointegrating relationships (logs)
## Data
```{r, echo=FALSE, fig.width = 12, fig.height = 6, fig.align="center", cache = TRUE}
readr::read_csv("assets/voldep/2022_10_14_eur_ref_co2_adj_hvpi_ex_nrg.csv") %>%
select(-EUR_USD, -hvpi_x_nrg) %>%
pivot_longer(-Date) %>%
mutate(name = factor(name,
levels = c("EUA", "Oil", "NGas", "Coal")
)) %>%
ggplot(aes(x = Date, y = value, color = name)) +
geom_line(
linewidth = linesize
) + # The ggplot2 call got a little simpler now :)
geom_vline(
# Beginning of the war
xintercept = as.Date("2022-02-24"),
col = as.character(cols[8, "grey"]),
linetype = "dashed"
) +
ylab("Prices EUR") +
xlab("Date") +
scale_color_manual(
values = as.character(c(
cols[8, col_eua],
cols[8, col_oil],
cols[8, col_gas],
cols[8, col_coal]
)),
labels = c(
"EUA",
"Oil",
"NGas",
"Coal"
)
) +
guides(color = guide_legend(override.aes = list(size = 2))) +
theme_minimal() +
theme(
legend.position = "bottom",
# legend.key.size = unit(1.5, "cm"),
legend.title = element_blank(),
text = element_text(
size = text_size
)
) +
scale_x_date(
breaks = as.Date(
paste0(
as.character(seq(2010, 2022, 2)),
"-01-01"
)
),
date_labels = "%Y"
) +
xlab(NULL) +
scale_y_continuous(trans = "log2")
```
## Modeling Approach: Overview
</br>
### VECM: Vector Error Correction Model
- Modeling the expectaion
- Captures the long-run cointegrating relationship
- Different cointegrating ranks, including rank zero (no cointegration)
### GARCH: Generalized Autoregressive Conditional Heteroscedasticity
- Captures the variance dynamics
### Copula: Captures the dependence structure
- Captures: conditional cross-sectional dependence structure
- Dependence allowed to vary over time
## Modeling Approach: Overview
:::: {.columns}
::: {.column width="48%"}
- Let $\boldsymbol{X}_t$ be a $K$-dimensional vector at time $t$
- The forecasting target:
- Conditional joint distribution
- $F_{\boldsymbol{X}_t|\mathcal{F}_{t-1}}$
- $\mathcal{F}_{t}$ is the sigma field generated by all information available up to and including time $t$
Sklars theorem: decompose target into
- marginal distributions: $F_{X_{k,t}|\mathcal{F}_{t-1}}$ for $k=1,\ldots, K$, and
- copula function: $C_{\boldsymbol{U}_{t}|\mathcal{F}_{t - 1}}$
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
Let $\boldsymbol{x}_t= (x_{1,t},\ldots, x_{K,t})^\intercal$ be the realized values
It holds that:
\begin{align}
F_{\boldsymbol{X}_t|\mathcal{F}_{t-1}}(\boldsymbol{x}_t) = C_{\boldsymbol{U}_{t}|\mathcal{F}_{t - 1}}(\boldsymbol{u}_t) \nonumber
\end{align}
with: $\boldsymbol{u}_t =(u_{1,t},\ldots, u_{K,t})^\intercal$, $u_{k,t} = F_{X_{k,t}|\mathcal{F}_{t-1}}(x_{k,t})$
For brewity we drop the conditioning on $\mathcal{F}_{t-1}$.
The model can be specified as follows
\begin{align}
F(\boldsymbol{x}_t) = C \left[\mathbf{F}(\boldsymbol{x}_t; \boldsymbol{\mu}_t, \boldsymbol{ \sigma }_{t}^2, \boldsymbol{\nu}, \boldsymbol{\lambda}); \Xi_t, \Theta\right] \nonumber
\end{align}
$\Xi_{t}$ denotes time-varying dependence parameters
$\Theta$ denotes time-invariant dependence parameters
We take $C$ as the $t$-copula
:::
::::
## Modeling Approach: Mean and Variance
:::: {.columns}
::: {.column width="48%"}
#### Individual marginal distributions:
$$\mathbf{F} = (F_1, \ldots, F_K)^{\intercal}$$
#### Generalized non-central t-distributions
- To account for heavy tails
- Time varying
- expectation: $\boldsymbol{\mu}_t = (\mu_{1,t}, \ldots, \mu_{K,t})^{\intercal}$
- variance: $\boldsymbol{\sigma}_{t}^2 = (\sigma_{1,t}^2, \ldots, \sigma_{K,t}^2)^{\intercal}$
- Time invariant
- degrees of freedom: $\boldsymbol{\nu} = (\nu_1, \ldots, \nu_K)^{\intercal}$
- noncentrality: $\boldsymbol{\lambda} = (\lambda_1, \ldots, \lambda_K)^{\intercal}$
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
#### VECM Model
\begin{align}
\Delta \boldsymbol{\mu}_t = \Pi \boldsymbol{x}_{t-1} + \Gamma \Delta \boldsymbol{x}_{t-1} \nonumber
\end{align}
where $\Pi = \alpha \beta^{\intercal}$ is the cointegrating matrix of rank $r$, $0 \leq r\leq K$.
#### GARCH model
\begin{align}
\sigma_{i,t}^2 = & \omega_i + \alpha^+_{i} (\epsilon_{i,t-1}^+)^2 + \alpha^-_{i} (\epsilon_{i,t-1}^-)^2 + \beta_i \sigma_{i,t-1}^2 \nonumber
\end{align}
where $\epsilon_{i,t-1}^+ = \max\{\epsilon_{i,t-1}, 0\}$ ...
Separate coefficients for positive and negative innovations to capture leverage effects.
:::
::::
## Modeling Approach: Dependence
:::: {.columns}
::: {.column width="48%"}
#### Time-varying dependence parameters
\begin{align*}
\Xi_{t} = & \Lambda\left(\boldsymbol{\xi}_{t}\right)
\\
\xi_{ij,t} = & \eta_{0,ij} + \eta_{1,ij} \xi_{ij,t-1} + \eta_{2,ij} z_{i,t-1} z_{j,t-1},
\end{align*}
$\xi_{ij,t}$ is a latent process
$z_{i,t}$ denotes the $i$-th standardized residual from time series $i$ at time point $t$
$\Lambda(\cdot)$ is a link function
- ensures that $\Xi_{t}$ is a valid variance covariance matrix
- ensures that $\Xi_{t}$ does not exceed its support space and remains semi-positive definite
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
#### Maximum Likelihood Estimation
All parameters can be estimated jointly. Using conditional independence:
\begin{align*}
L = f_{X_1} \prod_{i=2}^T f_{X_i|\mathcal{F}_{i-1}},
\end{align*}
with multivariate conditional density:
\begin{align*}
f_{\mathbf{X}_t}(\mathbf{x}_t | \mathcal{F}_{t-1}) = c\left[\mathbf{F}(\mathbf{x}_t;\boldsymbol{\mu}_t, \boldsymbol{\sigma}_{t}^2, \boldsymbol{\nu},
\boldsymbol{\lambda});\Xi_t, \Theta\right] \cdot \\ \prod_{i=1}^K f_{X_{i,t}}(\mathbf{x}_t;\boldsymbol{\mu}_t, \boldsymbol{\sigma}_{t}^2, \boldsymbol{\nu}, \boldsymbol{\lambda})
\end{align*}
The copula density $c$ can be derived analytically.
:::
::::
## Study Design and Evaluation
:::: {.columns}
::: {.column width="48%"}
#### Rolling-window forecasting study
- 3257 observations total
- Window size: 1000 days (~ four years)
- Forecasting 30-steps-ahead
=> 2227 potential starting points
We sample 250 to reduce computational cost
We draw $2^{12}= 2048$ trajectories from the joint predictive distribution
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
#### Evaluation
Forecasts are evaluated by the energy score (ES)
\begin{align*}
\text{ES}_t(F, \mathbf{x}_t) = \mathbb{E}_{F} \left(||\tilde{\mathbf{X}}_t - \mathbf{x}_t||_2\right) - \frac{1}{2} \mathbb{E}_F \left(||\tilde{\mathbf{X}}_t - \tilde{\mathbf{X}}_t'||_2 \right)
\end{align*}
where $\mathbf{x}_t$ is the observed $K$-dimensional realization and $\tilde{\mathbf{X}}_t$, respectively $\tilde{\mathbf{X}}_t'$ are independent random vectors distributed according to $F$
For univariate cases the Energy Score becomes the Continuous Ranked Probability Score (CRPS)
:::
::::
## Energy Scores
:::: {.columns}
::: {.column width="48%"}
Relative improvement in ES compared to $\text{RW}^{\sigma, \rho}$
Cellcolor: w.r.t test statistic of Diebold-Mariano test (testing wether the model outperformes the benchmark, greener = better).
```{r, echo=FALSE, results='asis', cache = TRUE}
load("assets/voldep/energy_df.Rdata")
table_energy <- energy %>%
kbl(
digits = 2,
col.names = c(
"Model",
"\\(\\text{ES}^{\\text{All}}_{1-30}\\)",
"\\(\\text{ES}^{\\text{EUA}}_{1-30}\\)",
"\\(\\text{ES}^{\\text{Oil}}_{1-30}\\)",
"\\(\\text{ES}^{\\text{NGas}}_{1-30}\\)",
"\\(\\text{ES}^{\\text{Coal}}_{1-30}\\)",
"\\(\\text{ES}^{\\text{All}}_{1}\\)",
"\\(\\text{ES}^{\\text{All}}_{5}\\)",
"\\(\\text{ES}^{\\text{All}}_{30}\\)"
),
# linesep = "",
bootstrap_options = "condensed",
# Dont replace any string, dataframe has to be valid latex code ...
escape = TRUE,
format = "html",
align = c("l", rep("r", ncol(energy) - 1))
) %>%
kable_paper(full_width = FALSE) %>%
row_spec(0:nrow(energy), color = cols[10, "grey"])
for (i in 2:ncol(energy)) {
bold_cells <- rep(FALSE, times = nrow(energy))
if (all(energy[-1, i, drop = TRUE] < 0)) {
bold_cells[1] <- TRUE
} else {
bold_cells[which.max(energy[-1, i, drop = TRUE]) + 1] <- TRUE
}
table_energy <- table_energy %>%
column_spec(i,
background = c(
cols[5, "grey"],
col_scale2(
energy_dm[, i, drop = TRUE][-1],
rng_t
)
),
bold = bold_cells
)
}
table_energy %>%
kable_styling(
bootstrap_options = c("condensed"),
full_width = TRUE,
font_size = 14
)
```
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
- Benchmarks:
- $\text{RW}^{\sigma, \rho}$: Random walk with constant volatility and correlation
- Univariate $\text{ETS}^{\sigma}$ with constant volatility
- Vector ETS $VES^{\sigma}$ with constant volatility
- Heteroscedasticity is a main driver of ES
- The VECM model without cointegration (essentially a VAR) is the best performing model in terms of ES overall
- For EUA, the ETS Benchmark is the best performing model in terms of ES
:::
::::
## CRPS Scores
:::: {.columns}
::: {.column width="28%"}
- CRPS solely evaluates the marginal distributions
- The cross-sectional dependence is ignored
- VES models deliver poor performance in short horizons
- For Oil prices the RW Benchmark can't be oupterformed 30 steps ahead
- Both VECM models generally deliver good performance
:::
::: {.column width="2%"}
:::
::: {.column width="68%"}
Improvement in CRPS of selected models relative to $\textrm{RW}^{\sigma, \rho}_{}$ in % (higher = better). Colored according to the test statistic of a DM-Test comparing to $\textrm{RW}^{\sigma, \rho}_{}$ (greener means lower test statistic i.e., better performance compared to $\textrm{RW}^{\sigma, \rho}_{}$).
```{r, echo=FALSE, results = 'asis', cache = TRUE}
load("assets/voldep/crps_df.Rdata")
table_crps <- crps %>%
kbl(
col.names = c("Model", rep(paste0("H", h_sub), 4)),
digits = 1,
linesep = "",
# Dont replace any string, dataframe has to be valid latex code ...
escape = FALSE,
format = "html",
booktabs = TRUE,
align = c("l", rep("r", ncol(crps) - 1))
) %>%
kable_paper(full_width = FALSE)
for (i in 2:ncol(crps)) {
bold_cells <- rep(FALSE, times = nrow(crps))
if (all(crps[-1, i, drop = TRUE] < 0)) {
bold_cells[1] <- TRUE
} else {
bold_cells[which.max(crps[-1, i, drop = TRUE]) + 1] <- TRUE
}
table_crps <- table_crps %>%
column_spec(i,
background = c(cols[5, "grey"], col_scale2(
crps_dm[, i, drop = TRUE][-1], rng_t
)),
bold = bold_cells
)
}
table_crps <- table_crps %>%
add_header_above(c(" ", "EUA" = 3, "Oil" = 3, "NGas" = 3, "Coal" = 3))
table_crps %>%
kable_styling(
bootstrap_options = c("condensed"),
full_width = FALSE,
font_size = 16
)
```
:::
::::
## RMSE
:::: {.columns}
::: {.column width="28%"}
RMSE measures the performance of the forecasts at their mean
</br>
- Some models beat the benchmarks at short horizons
</br>
Conclusion: the Improvements seen before must be attributed to other parts of the multivariate probabilistic predictive distribution
:::
::: {.column width="2%"}
:::
::: {.column width="68%"}
Improvement in RMSE score of selected models relative to $\textrm{RW}^{\sigma, \rho}_{}$ in % (higher = better). Colored according to the test statistic of a DM-Test comparing to $\textrm{RW}^{\sigma, \rho}_{}$ (greener means lower test statistic i.e., better performance compared to $\textrm{RW}^{\sigma, \rho}_{}$).
```{r, echo=FALSE, results = 'asis', cache = TRUE}
load("assets/voldep/rmsq_df.Rdata")
table_rmsq <- rmsq %>%
kbl(
col.names = c("Model", rep(paste0("H", h_sub), 4)),
digits = 1,
# Dont replace any string, dataframe has to be valid latex code ...
escape = FALSE,
linesep = "",
format = "html",
align = c("l", rep("r", ncol(rmsq) - 1))
) %>%
kable_paper(full_width = FALSE)
for (i in 2:ncol(rmsq)) {
bold_cells <- rep(FALSE, times = nrow(rmsq))
if (all(rmsq[-1, i, drop = TRUE] < 0)) {
bold_cells[1] <- TRUE
} else {
bold_cells[which.max(rmsq[-1, i, drop = TRUE]) + 1] <- TRUE
}
table_rmsq <- table_rmsq %>%
column_spec(i,
background = c(cols[5, "grey"], col_scale2(
rmsq_dm[, i, drop = TRUE][-1], rng_t
)),
bold = bold_cells
)
}
table_rmsq <- table_rmsq %>%
add_header_above(c(" ", "EUA" = 3, "Oil" = 3, "NGas" = 3, "Coal" = 3))
table_rmsq %>%
kable_styling(
bootstrap_options = c("condensed"),
full_width = FALSE,
font_size = 14
)
```
:::
::::
## Evolution of Linear Dependence $\Xi$
```{r, echo=FALSE, fig.width = 12, fig.height = 6, fig.align="center", cache = TRUE}
load("assets/voldep/plot_rho_df.Rdata")
ggplot() +
geom_line(
size = linesize,
data = plot_data[plot_data$name == "5Oil-Coal", ],
aes(x = idx, y = value, col = name)
) +
geom_line(
size = linesize,
data = plot_data[plot_data$name == "2EUA-NGas", ],
aes(x = idx, y = value, col = name)
) +
geom_line(
size = linesize,
data = plot_data[plot_data$name == "1EUA-Oil", ],
aes(x = idx, y = value, col = name)
) +
geom_line(
size = linesize,
data = plot_data[plot_data$name == "3EUA-Coal", ],
aes(x = idx, y = value, col = name)
) +
geom_line(
size = linesize,
data = plot_data[plot_data$name == "6NGas-Coal", ],
aes(x = idx, y = value, col = name)
) +
geom_line(
size = linesize,
data = plot_data[plot_data$name == "4Oil-NGas", ],
aes(x = idx, y = value, col = name)
) +
geom_vline(
xintercept = as.Date("2022-02-24"), # Beginning of the war
col = as.character(cols[8, "grey"]),
linetype = "dashed"
) +
geom_vline(
xintercept = as.Date("2022-02-28"), # Central bank of russia was blocked from access to foreign reserves
col = as.character(cols[8, "grey"]),
linetype = "dotted"
) +
scale_colour_manual(
values = as.character(
cols[
6,
c("green", "purple", "blue", "red", "orange", "brown")
]
),
labels = c(
paste0(varnames[comb[, 1] + 1], "-", varnames[comb[, 2] + 1])
)
) +
theme_minimal() +
theme(
zoom.x = element_rect(fill = cols[4, "grey"], colour = NA),
legend.position = "bottom",
text = element_text(size = text_size)
) +
ylab("Correlation") +
scale_x_date(
breaks = date_breaks_fct,
labels = date_labels_fct
) +
xlab(NULL) +
guides(col = guide_legend(
title = NULL,
nrow = 1,
byrow = TRUE,
override.aes = list(size = 2)
)) +
# scale_y_continuous(breaks = seq(-1.5, 1.0, 0.5)) +
ggforce::facet_zoom(
xlim = c(
as.Date("2022-02-02"),
as.Date("2022-04-30")
),
zoom.size = 1.5,
show.area = TRUE
)
```
## Predictive Quantiles (Russian Invasion)
```{r, echo=FALSE, fig.width = 12, fig.height = 6, fig.align="center", cache = TRUE}
load("assets/voldep/plot_quant_df.Rdata")
plot_quant_data %>% ggplot(aes(x = date, y = value)) +
geom_line(size = 1, aes(col = quant)) +
geom_vline(
xintercept = as.Date("2022-02-24"), # Beginning of the war
col = as.character(cols[8, "grey"]),
linetype = "dashed"
) +
geom_vline(
xintercept = as.Date("2022-02-28"), # Central bank of russia was blocked from access to foreign reserves
col = as.character(cols[8, "grey"]),
linetype = "dotted"
) +
scale_colour_manual(
values = as.character(
c(
cols[
5,
c(1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15)
],
cols[9, 18]
)
)
) +
facet_grid(var ~ ., scales = "free_y") +
theme_minimal() +
theme(
legend.position = "bottom",
text = element_text(size = text_size),
strip.background = element_rect(
fill = cols[1, 18],
colour = cols[1, 18]
)
) +
ylab("EUR") +
scale_x_date(
breaks = (seq(
as.Date("2022-02-15"),
as.Date("2022-03-31"),
by = "4 day"
) + 1)[-12],
date_labels = "%b %d",
limits = c(as.Date("2022-02-15"), as.Date("2022-03-31"))
) +
xlab(NULL) +
guides(col = guide_legend(
title = "Quantiles [\\%]",
nrow = 2,
byrow = TRUE,
override.aes = list(size = 2)
)) +
scale_y_continuous(
trans = "log2",
breaks = y_breaks
)
```
## Conclusion
:::: {.columns}
::: {.column width="48%"}
Accounting for heteroscedasticity or stabilizing the variance via log transformation is crucial for good performance in terms of ES
- Price dynamics emerged way before the russian invaion into ukraine
- Linear dependence between the series reacted only right after the invasion
- Improvements in forecasting performance is mainly attributed to:
- the tails multivariate probabilistic predictive distribution
- the dependence structure between the marginals
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
</br>
<center>
<img src="assets/voldep/frame.png">
</center>
`r fontawesome::fa("newspaper")` @berrisch2023modeling
:::
::::
## References