2140 lines
54 KiB
Plaintext
2140 lines
54 KiB
Plaintext
---
|
|
title: "Data Science Methods for Forecasting in Energy and Economics"
|
|
date: 2025-07-10
|
|
author:
|
|
- name: Jonathan Berrisch
|
|
affiliations:
|
|
- ref: hemf
|
|
affiliations:
|
|
- id: hemf
|
|
name: University of Duisburg-Essen, House of Energy Markets and Finance
|
|
format:
|
|
revealjs:
|
|
embed-resources: true
|
|
footer: ""
|
|
logo: assets/logos_combined.png
|
|
theme: [default, clean.scss]
|
|
smaller: true
|
|
fig-format: svg
|
|
execute:
|
|
daemon: false
|
|
highlight-style: github
|
|
---
|
|
|
|
<!--
|
|
Render with: quarto preview /home/jonathan/git/PHD-Presentation/25_07_phd_defense/index.qmd --no-browser --port 6074
|
|
-->
|
|
|
|
## Outline
|
|
|
|
::: {.hidden}
|
|
$$
|
|
\newcommand{\A}{{\mathbb A}}
|
|
$$
|
|
:::
|
|
|
|
<br>
|
|
|
|
::: {style="font-size: 150%;"}
|
|
|
|
[{{< fa bars-staggered >}}]{style="color: #404040;"}   Introduction & Research Motivation
|
|
|
|
[{{< fa bars-staggered >}}]{style="color: #404040;"}   Overview of the Thesis
|
|
|
|
[{{< fa table >}}]{style="color: #404040;"}   Online Learning
|
|
|
|
[{{< fa circle-nodes >}}]{style="color: #404040;"}   Probabilistic Forecasting of European Carbon and Energy Prices
|
|
|
|
[{{< fa lightbulb >}}]{style="color: #404040;"}   Limitations
|
|
|
|
[{{< fa binoculars >}}]{style="color: #404040;"}   Contributions & Outlook
|
|
|
|
:::
|
|
|
|
## PHD DeFence
|
|
|
|
```{r, setup, include=FALSE}
|
|
# Compile with: rmarkdown::render("crps_learning.Rmd")
|
|
library(latex2exp)
|
|
library(ggplot2)
|
|
library(dplyr)
|
|
library(tidyr)
|
|
library(purrr)
|
|
library(kableExtra)
|
|
knitr::opts_chunk$set(
|
|
dev = "svglite" # Use svg figures
|
|
)
|
|
library(RefManageR)
|
|
BibOptions(
|
|
check.entries = TRUE,
|
|
bib.style = "authoryear",
|
|
cite.style = "authoryear",
|
|
style = "html",
|
|
hyperlink = TRUE,
|
|
dashed = FALSE
|
|
)
|
|
my_bib <- ReadBib("assets/library.bib", check = FALSE)
|
|
col_lightgray <- "#e7e7e7"
|
|
col_blue <- "#000088"
|
|
col_smooth_expost <- "#a7008b"
|
|
col_smooth <- "#187a00"
|
|
col_pointwise <- "#008790"
|
|
col_constant <- "#dd9002"
|
|
col_optimum <- "#666666"
|
|
```
|
|
|
|
```{r xaringan-panelset, echo=FALSE}
|
|
xaringanExtra::use_panelset()
|
|
```
|
|
|
|
```{r xaringanExtra-freezeframe, echo=FALSE}
|
|
xaringanExtra::use_freezeframe(responsive = TRUE)
|
|
```
|
|
|
|
# Outline
|
|
|
|
- [Motivation](#motivation)
|
|
- [The Framework of Prediction under Expert Advice](#pred_under_exp_advice)
|
|
- [The Continious Ranked Probability Scrore](#crps)
|
|
- [Optimality of (Pointwise) CRPS-Learning](#crps_optim)
|
|
- [A Simple Probabilistic Example](#simple_example)
|
|
- [The Proposed CRPS-Learning Algorithm](#proposed_algorithm)
|
|
- [Simulation Results](#simulation)
|
|
- [Possible Extensions](#extensions)
|
|
- [Application Study](#application)
|
|
- [Wrap-Up](#conclusion)
|
|
- [References](#references)
|
|
|
|
---
|
|
|
|
# Motivation
|
|
|
|
name: motivation
|
|
|
|
## Motivation
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
The Idea:
|
|
|
|
- Combine multiple forecasts instead of choosing one
|
|
|
|
- Combination weights may vary over **time**, over the **distribution** or **both**
|
|
|
|
2 Popular options for combining distributions:
|
|
|
|
- Combining across quantiles (this paper)
|
|
- Horizontal aggregation, vincentization
|
|
- Combining across probabilities
|
|
- Vertical aggregation
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
::: {.panel-tabset}
|
|
|
|
## Time
|
|
|
|
```{r, echo = FALSE, fig.height=6}
|
|
par(mfrow = c(3, 3), mar = c(2, 2, 2, 2))
|
|
set.seed(1)
|
|
# Data
|
|
X <- matrix(ncol = 3, nrow = 15)
|
|
X[, 1] <- seq(from = 8, to = 12, length.out = 15) + 0.25 * rnorm(15)
|
|
X[, 2] <- 10 + 0.25 * rnorm(15)
|
|
X[, 3] <- seq(from = 12, to = 8, length.out = 15) + 0.25 * rnorm(15)
|
|
# Weights
|
|
w <- matrix(ncol = 3, nrow = 15)
|
|
w[, 1] <- sin(0.1 * 1:15)
|
|
w[, 2] <- cos(0.1 * 1:15)
|
|
w[, 3] <- seq(from = -2, 0.25, length.out = 15)^2
|
|
w <- (w / rowSums(w))
|
|
# Vis
|
|
plot(X[, 1],
|
|
lwd = 4,
|
|
type = "l",
|
|
ylim = c(8, 12),
|
|
xlab = "",
|
|
ylab = "",
|
|
xaxt = "n",
|
|
yaxt = "n",
|
|
bty = "n",
|
|
col = "#2050f0"
|
|
)
|
|
plot(w[, 1],
|
|
lwd = 4, type = "l",
|
|
ylim = c(0, 1),
|
|
xlab = "",
|
|
ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#2050f0"
|
|
)
|
|
text(6, 0.5, TeX("$w_1(t)$"), cex = 2, col = "#2050f0")
|
|
arrows(13, 0.25, 15, 0.0, , lwd = 4, bty = "n")
|
|
plot.new()
|
|
plot(X[, 2],
|
|
lwd = 4,
|
|
type = "l", ylim = c(8, 12),
|
|
xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "purple"
|
|
)
|
|
plot(w[, 2],
|
|
lwd = 4, type = "l",
|
|
ylim = c(0, 1),
|
|
xlab = "",
|
|
ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "purple"
|
|
)
|
|
text(6, 0.6, TeX("$w_2(t)$"), cex = 2, col = "purple")
|
|
arrows(13, 0.5, 15, 0.5, , lwd = 4, bty = "n")
|
|
plot(rowSums(X * w), lwd = 4, type = "l", xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#298829")
|
|
plot(X[, 3],
|
|
lwd = 4,
|
|
type = "l", ylim = c(8, 12),
|
|
xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#e423b4"
|
|
)
|
|
plot(w[, 3],
|
|
lwd = 4, type = "l",
|
|
ylim = c(0, 1),
|
|
xlab = "",
|
|
ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#e423b4"
|
|
)
|
|
text(6, 0.25, TeX("$w_3(t)$"), cex = 2, col = "#e423b4")
|
|
arrows(13, 0.75, 15, 1, , lwd = 4, bty = "n")
|
|
```
|
|
|
|
## Distribution
|
|
|
|
```{r, echo = FALSE, fig.height=6}
|
|
par(mfrow = c(3, 3), mar = c(2, 2, 2, 2))
|
|
set.seed(1)
|
|
# Data
|
|
X <- matrix(ncol = 3, nrow = 31)
|
|
|
|
X[, 1] <- dchisq(0:30, df = 10)
|
|
X[, 2] <- dnorm(0:30, mean = 15, sd = 5)
|
|
X[, 3] <- dexp(0:30, 0.2)
|
|
# Weights
|
|
w <- matrix(ncol = 3, nrow = 31)
|
|
w[, 1] <- sin(0.05 * 0:30)
|
|
w[, 2] <- cos(0.05 * 0:30)
|
|
w[, 3] <- seq(from = -2, 0.25, length.out = 31)^2
|
|
w <- (w / rowSums(w))
|
|
# Vis
|
|
plot(X[, 1],
|
|
lwd = 4,
|
|
type = "l",
|
|
xlab = "",
|
|
ylab = "",
|
|
xaxt = "n",
|
|
yaxt = "n",
|
|
bty = "n",
|
|
col = "#2050f0"
|
|
)
|
|
plot(X[, 2],
|
|
lwd = 4,
|
|
type = "l",
|
|
xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "purple"
|
|
)
|
|
plot(X[, 3],
|
|
lwd = 4,
|
|
type = "l",
|
|
xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#e423b4"
|
|
)
|
|
plot(w[, 1],
|
|
lwd = 4, type = "l",
|
|
ylim = c(0, 1),
|
|
xlab = "",
|
|
ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#2050f0"
|
|
)
|
|
text(12, 0.5, TeX("$w_1(x)$"), cex = 2, col = "#2050f0")
|
|
arrows(26, 0.25, 31, 0.0, , lwd = 4, bty = "n")
|
|
plot(w[, 2],
|
|
lwd = 4, type = "l",
|
|
ylim = c(0, 1),
|
|
xlab = "",
|
|
ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "purple"
|
|
)
|
|
text(15, 0.5, TeX("$w_2(x)$"), cex = 2, col = "purple")
|
|
arrows(15, 0.25, 15, 0, , lwd = 4, bty = "n")
|
|
plot(w[, 3],
|
|
lwd = 4, type = "l",
|
|
ylim = c(0, 1),
|
|
xlab = "",
|
|
ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#e423b4"
|
|
)
|
|
text(20, 0.5, TeX("$w_3(x)$"), cex = 2, col = "#e423b4")
|
|
arrows(5, 0.25, 0, 0, , lwd = 4, bty = "n")
|
|
plot.new()
|
|
plot(rowSums(X * w), lwd = 4, type = "l", xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#298829")
|
|
```
|
|
|
|
:::
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
# The Framework of Prediction under Expert Advice
|
|
|
|
## The Framework of Prediction under Expert Advice
|
|
|
|
### The sequential framework
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
Each day, $t = 1, 2, ... T$
|
|
- The **forecaster** receives predictions $\widehat{X}_{t,k}$ from $K$ **experts**
|
|
- The **forecaster** assings weights $w_{t,k}$ to each **expert**
|
|
- The **forecaster** calculates her prediction:
|
|
\begin{equation}
|
|
\widetilde{X}_{t} = \sum_{k=1}^K w_{t,k} \widehat{X}_{t,k}.
|
|
\label{eq_forecast_def}
|
|
\end{equation}
|
|
- The realization for $t$ is observedilities
|
|
- Vertical aggregation
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
- The experts can be institutions, persons, or models
|
|
- The forecasts can be point-forecasts (i.e., mean or median) or full predictive distributions
|
|
- We do not need any assumptions concerning the underlying data
|
|
- `r Citet(my_bib, "cesa2006prediction")`
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
---
|
|
|
|
## The Regret
|
|
|
|
Weights are updated sequentially according to the past performance of the $K$ experts.
|
|
|
|
That is, a loss function $\ell$ is needed. This is used to compute the **cumulative regret** $R_{t,k}$
|
|
|
|
\begin{equation}
|
|
R_{t,k} = \widetilde{L}_{t} - \widehat{L}_{t,k} = \sum_{i = 1}^t \ell(\widetilde{X}_{i},Y_i) - \ell(\widehat{X}_{i,k},Y_i)
|
|
\label{eq_regret}
|
|
\end{equation}
|
|
|
|
The cumulative regret:
|
|
- Indicates the predictive accuracy of the expert $k$ until time $t$.
|
|
- Measures how much the forecaster *regrets* not having followed the expert's advice
|
|
|
|
Popular loss functions for point forecasting `r Citet(my_bib, "gneiting2011making")`:
|
|
.pull-left[
|
|
- $\ell_2$-loss $\ell_2(x, y) = | x -y|^2$
|
|
- optimal for mean prediction
|
|
]
|
|
.pull-right[
|
|
- $\ell_1$-loss $\ell_1(x, y) = | x -y|$
|
|
- optimal for median predictions
|
|
]
|
|
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
- $\ell_2$-loss $\ell_2(x, y) = | x -y|^2$
|
|
- optimal for mean prediction
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
- $\ell_1$-loss $\ell_1(x, y) = | x -y|$
|
|
- optimal for median predictions
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
|
|
## Popular Aggregation Algorithms
|
|
|
|
#### The naive combination
|
|
|
|
\begin{equation}
|
|
w_{t,k}^{\text{Naive}} = \frac{1}{K}
|
|
\end{equation}
|
|
|
|
#### The exponentially weighted average forecaster (EWA)
|
|
|
|
\begin{align}
|
|
w_{t,k}^{\text{EWA}} & = \frac{e^{\eta R_{t,k}} }{\sum_{k = 1}^K e^{\eta R_{t,k}}}
|
|
=
|
|
\frac{e^{-\eta \ell(\widehat{X}_{t,k},Y_t)} w^{\text{EWA}}_{t-1,k} }{\sum_{k = 1}^K e^{-\eta \ell(\widehat{X}_{t,k},Y_t)} w^{\text{EWA}}_{t-1,k} }
|
|
\label{eq_ewa_general}
|
|
\end{align}
|
|
|
|
#### The polynomial weighted aggregation (PWA)
|
|
|
|
\begin{align}
|
|
w_{t,k}^{\text{PWA}} & = \frac{ 2(R_{t,k})^{q-1}_{+} }{ \|(R_t)_{+}\|^{q-2}_q}
|
|
\label{eq_pwa_general}
|
|
\end{align}
|
|
|
|
with $q\geq 2$ and $x_{+}$ the (vector) of positive parts of $x$.
|
|
|
|
## Optimality
|
|
|
|
In stochastic settings, the cumulative Risk should be analyezed `r Citet(my_bib, "wintenberger2017optimal")`:
|
|
|
|
\begin{align}
|
|
\underbrace{\widetilde{\mathcal{R}}_t = \sum_{i=1}^t \mathbb{E}[\ell(\widetilde{X}_{i},Y_i)|\mathcal{F}_{i-1}]}_{\text{Cumulative Risk of Forecaster}} \qquad\qquad\qquad \text{ and } \qquad\qquad\qquad
|
|
\underbrace{\widehat{\mathcal{R}}_{t,k} = \sum_{i=1}^t \mathbb{E}[\ell(\widehat{X}_{i,k},Y_i)|\mathcal{F}_{i-1}]}_{\text{Cumulative Risk of Experts}}
|
|
\label{eq_def_cumrisk}
|
|
\end{align}
|
|
|
|
There are two problems that an algorithm should solve in iid settings:
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
### The selection problem
|
|
\begin{equation}
|
|
\frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\min} \right) \stackrel{t\to \infty}{\rightarrow} a \quad \text{with} \quad a \leq 0.
|
|
\label{eq_opt_select}
|
|
\end{equation}
|
|
The forecaster is asymptotically not worse than the best expert $\widehat{\mathcal{R}}_{t,\min}$.
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
### The convex aggregation problem
|
|
|
|
\begin{equation}
|
|
\frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\pi} \right) \stackrel{t\to \infty}{\rightarrow} b \quad \text{with} \quad b \leq 0 .
|
|
\label{eq_opt_conv}
|
|
\end{equation}
|
|
The forecaster is asymptotically not worse than the best convex combination $\widehat{X}_{t,\pi}$ in hindsight (**oracle**).
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
## Optimality
|
|
|
|
Satisfying the convexity property \eqref{eq_opt_conv} comes at the cost of slower possible convergence.
|
|
|
|
According to `r Citet(my_bib, "wintenberger2017optimal")`, an algorithm has optimal rates with respect to selection \eqref{eq_opt_select} and convex aggregation \eqref{eq_opt_conv} if
|
|
|
|
\begin{align}
|
|
\frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\min} \right) & =
|
|
\mathcal{O}\left(\frac{\log(K)}{t}\right)\label{eq_optp_select}
|
|
\end{align}
|
|
|
|
\begin{align}
|
|
\frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\pi} \right) & =
|
|
\mathcal{O}\left(\sqrt{\frac{\log(K)}{t}}\right)
|
|
\label{eq_optp_conv}
|
|
\end{align}
|
|
|
|
Algorithms can statisfy both \eqref{eq_optp_select} and \eqref{eq_optp_conv} depending on:
|
|
|
|
- The loss function
|
|
- Regularity conditions on $Y_t$ and $\widehat{X}_{t,k}$
|
|
- The weighting scheme
|
|
|
|
## Optimality
|
|
|
|
According to `r Citet(my_bib, "cesa2006prediction")` EWA \eqref{eq_ewa_general} satisfies the optimal selection convergence \eqref{eq_optp_select} in a deterministic setting if the:
|
|
- Loss $\ell$ is exp-concave
|
|
- Learning-rate $\eta$ is chosen correctly
|
|
|
|
Those results can be converted to stochastic iid settings `r Citet(my_bib, "kakade2008generalization")` `r Citet(my_bib, "gaillard2014second")`.
|
|
|
|
The optimal convex aggregation convergence \eqref{eq_optp_conv} can be satisfied by applying the kernel-trick. Thereby, the loss is linearized:
|
|
\begin{align}
|
|
\ell^{\nabla}(x,y) = \ell'(\widetilde{X},y) x
|
|
\end{align}
|
|
$\ell'$ is the subgradient of $\ell$ in its first coordinate evaluated at forecast combination $\widetilde{X}$.
|
|
|
|
Combining probabilistic forecasts calls for a probabilistic loss function
|
|
|
|
:::: {.notes}
|
|
|
|
We apply Bernstein Online Aggregation (BOA). It lets us weaken the exp-concavity condition while almost keeping the optimalities \ref{eq_optp_select} and \ref{eq_optp_conv}.
|
|
|
|
::::
|
|
|
|
## The Continuous Ranked Probability Score
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
**An appropriate choice:**
|
|
|
|
\begin{align*}
|
|
\text{CRPS}(F, y) & = \int_{\mathbb{R}} {(F(x) - \mathbb{1}\{ x > y \})}^2 dx
|
|
\label{eq_crps}
|
|
\end{align*}
|
|
|
|
It's strictly proper `r Citet(my_bib, "gneiting2007strictly")`.
|
|
|
|
Using the CRPS, we can calculate time-adaptive weight $w_{t,k}$. However, what if the experts' performance is not uniform over all parts of the distribution?
|
|
|
|
The idea: utilize this relation:
|
|
|
|
\begin{align*}
|
|
\text{CRPS}(F, y) = 2 \int_0^{1} \text{QL}_p(F^{-1}(p), y) \, d p.
|
|
\label{eq_crps_qs}
|
|
\end{align*}
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
to combine quantiles of the probabilistic forecasts individually using the quantile-loss (QL):
|
|
\begin{align*}
|
|
\text{QL}_p(q, y) & = (\mathbb{1}\{y < q\} -p)(q - y)
|
|
\end{align*}
|
|
|
|
</br>
|
|
|
|
**But is it optimal?**
|
|
|
|
CRPS is exp-concave `r fontawesome::fa("check", fill ="#00b02f")`
|
|
|
|
`r fontawesome::fa("arrow-right", fill ="#000000")` EWA \eqref{eq_ewa_general} with CRPS satisfies \eqref{eq_optp_select} and \eqref{eq_optp_conv}
|
|
|
|
QL is convex, but not exp-concave `r fontawesome::fa("exclamation", fill ="#ffa600")`
|
|
|
|
`r fontawesome::fa("arrow-right", fill ="#000000")` Bernstein Online Aggregation (BOA) lets us weaken the exp-concavity condition while almost keeping optimal convergence
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
## CRPS-Learning Optimality
|
|
|
|
For convex losses, BOAG satisfies that there exist a $C>0$ such that for $x>0$ it holds that
|
|
\begin{equation}
|
|
P\left( \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\pi} \right) \leq C \log(\log(t)) \left(\sqrt{\frac{\log(K)}{t}} + \frac{\log(K)+x}{t}\right) \right) \geq
|
|
1-e^{x}
|
|
\label{eq_boa_opt_conv}
|
|
\end{equation}
|
|
`r fontawesome::fa("arrow-right", fill ="#000000")` Almost optimal w.r.t *convex aggregation* \eqref{eq_optp_conv} `r Citet(my_bib, "wintenberger2017optimal")` .
|
|
|
|
The same algorithm satisfies that there exist a $C>0$ such that for $x>0$ it holds that
|
|
\begin{equation}
|
|
P\left( \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\min} \right) \leq
|
|
C\left(\frac{\log(K)+\log(\log(Gt))+ x}{\alpha t}\right)^{\frac{1}{2-\beta}} \right) \geq
|
|
1-e^{x}
|
|
\label{eq_boa_opt_select}
|
|
\end{equation}
|
|
|
|
if $Y_t$ is bounded, the considered loss $\ell$ is convex $G$-Lipschitz and weak exp-concave in its first coordinate.
|
|
|
|
This is for losses that satisfy **A1** and **A2**.
|
|
|
|
## CRPS-Learning Optimality
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
**A1**
|
|
|
|
For some $G>0$ it holds
|
|
for all $x_1,x_2\in \mathbb{R}$ and $t>0$ that
|
|
|
|
$$ | \ell(x_1, Y_t)-\ell(x_2, Y_t) | \leq G |x_1-x_2|$$
|
|
|
|
**A2** For some $\alpha>0$, $\beta\in[0,1]$ it holds
|
|
for all $x_1,x_2 \in \mathbb{R}$ and $t>0$ that
|
|
|
|
\begin{align*}
|
|
\mathbb{E}[
|
|
& \ell(x_1, Y_t)-\ell(x_2, Y_t) | \mathcal{F}_{t-1}] \leq \\
|
|
& \mathbb{E}[ \ell'(x_1, Y_t)(x_1 - x_2) |\mathcal{F}_{t-1}] \\
|
|
& +
|
|
\mathbb{E}\left[ \left. \left( \alpha(\ell'(x_1, Y_t)(x_1 - x_2))^{2}\right)^{1/\beta} \right|\mathcal{F}_{t-1}\right]
|
|
\end{align*}
|
|
|
|
`r fontawesome::fa("arrow-right", fill ="#000000")` Almost optimal w.r.t *selection* \eqref{eq_optp_select} `r Citet(my_bib, "gaillard2018efficient")`.
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
**Lemma 1**
|
|
|
|
\begin{align}
|
|
2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\min}
|
|
& \leq \widehat{\mathcal{R}}^{\text{CRPS}}_{t,\min}
|
|
\label{eq_risk_ql_crps_expert} \\
|
|
2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\pi}
|
|
& \leq \widehat{\mathcal{R}}^{\text{CRPS}}_{t,\pi} .
|
|
\label{eq_risk_ql_crps_convex}
|
|
\end{align}
|
|
|
|
Pointwise can outperform constant procedures
|
|
|
|
QL is convex but not exp-concave:
|
|
|
|
`r fontawesome::fa("arrow-right")` Almost optimal convergence w.r.t. *convex aggregation* \eqref{eq_boa_opt_conv} `r fontawesome::fa("check", fill ="#00b02f")` </br>
|
|
|
|
For almost optimal congerence w.r.t. *selection* \eqref{eq_boa_opt_select} we need to check **A1** and **A2**:
|
|
|
|
QL is Lipschitz continuous:
|
|
|
|
`r fontawesome::fa("arrow-right")` **A1** holds `r fontawesome::fa("check", fill ="#ffa600")` </br>
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
|
|
## CRPS-Learning Optimality
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
Conditional quantile risk: $\mathcal{Q}_p(x) = \mathbb{E}[ \text{QL}_p(x, Y_t) | \mathcal{F}_{t-1}]$.
|
|
|
|
`r fontawesome::fa("arrow-right")` convexity properties of $\mathcal{Q}_p$ depend on the
|
|
conditional distribution $Y_t|\mathcal{F}_{t-1}$.
|
|
|
|
**Proposition 1**
|
|
|
|
Let $Y$ be a univariate random variable with (Radon-Nikodym) $\nu$-density $f$, then for the second subderivative of the quantile risk
|
|
$\mathcal{Q}_p(x) = \mathbb{E}[ \text{QL}_p(x, Y) ]$
|
|
of $Y$ it holds for all $p\in(0,1)$ that
|
|
$\mathcal{Q}_p'' = f.$
|
|
Additionally, if $f$ is a continuous Lebesgue-density with $f\geq\gamma>0$ for some constant $\gamma>0$ on its support $\text{spt}(f)$ then
|
|
is $\mathcal{Q}_p$ is $\gamma$-strongly convex.
|
|
|
|
Strong convexity with $\beta=1$ implies **A2** `r fontawesome::fa("check", fill ="#ffa600")` `r Citet(my_bib, "gaillard2018efficient")`
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
`r fontawesome::fa("arrow-right")` **A1** and **A2** give us almost optimal convergence w.r.t. selection \eqref{eq_boa_opt_select} `r fontawesome::fa("check", fill ="#00b02f")` </br>
|
|
|
|
**Theorem 1**
|
|
|
|
The gradient based fully adaptive Bernstein online aggregation (BOAG) applied pointwise for all $p\in(0,1)$ on $\text{QL}$ satisfies
|
|
\eqref{eq_boa_opt_conv} with minimal CRPS given by
|
|
|
|
$$\widehat{\mathcal{R}}_{t,\pi} = 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\pi}.$$
|
|
|
|
If $Y_t|\mathcal{F}_{t-1}$ is bounded
|
|
and has a pdf $f_t$ satifying $f_t>\gamma >0$ on its
|
|
support $\text{spt}(f_t)$ then \ref{eq_boa_opt_select} holds with $\beta=1$ and
|
|
|
|
$$\widehat{\mathcal{R}}_{t,\min} = 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\min}$$.
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
|
|
## A Probabilistic Example
|
|
|
|
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
Simple Example:
|
|
|
|
|
|
\begin{align}
|
|
Y_t & \sim \mathcal{N}(0,\,1) \\
|
|
\widehat{X}_{t,1} & \sim \widehat{F}_{1} = \mathcal{N}(-1,\,1) \\
|
|
\widehat{X}_{t,2} & \sim \widehat{F}_{2} = \mathcal{N}(3,\,4)
|
|
\label{eq:dgp_sim1}
|
|
\end{align}
|
|
|
|
- True weights vary over $p$
|
|
- Figures show the ECDF and calculated weights using $T=25$ realizations
|
|
- Pointwise solution creates rough estimates
|
|
- Pointwise is better than constant
|
|
- Smooth solution is better than pointwise
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
::: {.panel-tabset}
|
|
|
|
## CDFs
|
|
|
|
```{r, echo = FALSE, fig.width=7, fig.height=6, fig.align='center', cache = FALSE}
|
|
source("assets/01_common.R")
|
|
load("assets/crps_learning/01_motivation_01.RData")
|
|
ggplot(df, aes(x = x, y = y, xend = xend, yend = yend)) +
|
|
stat_function(
|
|
fun = pnorm, n = 10000,
|
|
args = list(mean = dev[2], sd = experts_sd[2]),
|
|
aes(col = "Expert 2"), size = 1.5
|
|
) +
|
|
stat_function(
|
|
fun = pnorm, n = 10000,
|
|
args = list(mean = dev[1], sd = experts_sd[1]),
|
|
aes(col = "Expert 1"), size = 1.5
|
|
) +
|
|
stat_function(
|
|
fun = pnorm,
|
|
n = 10000,
|
|
size = 1.5, aes(col = "DGP") # , linetype = "dashed"
|
|
) +
|
|
geom_point(aes(col = "ECDF"), size = 1.5, show.legend = FALSE) +
|
|
geom_segment(aes(col = "ECDF")) +
|
|
geom_segment(data = tibble(
|
|
x_ = -5,
|
|
xend_ = min(y),
|
|
y_ = 0,
|
|
yend_ = 0
|
|
), aes(x = x_, xend = xend_, y = y_, yend = yend_)) +
|
|
theme_minimal() +
|
|
theme(
|
|
text = element_text(size = text_size),
|
|
legend.position = "bottom",
|
|
legend.key.width = unit(1.5, "cm")
|
|
) +
|
|
ylab("Probability p") +
|
|
xlab("Value") +
|
|
scale_colour_manual(NULL, values = c("#969696", "#252525", col_auto, col_blue)) +
|
|
guides(color = guide_legend(
|
|
nrow = 2,
|
|
byrow = FALSE # ,
|
|
# override.aes = list(
|
|
# size = c(1.5, 1.5, 1.5, 1.5)
|
|
# )
|
|
)) +
|
|
scale_x_continuous(limits = c(-5, 7.5))
|
|
```
|
|
|
|
## Weights
|
|
|
|
```{r, echo = FALSE, fig.width=7, fig.height=6, fig.align='center', cache = FALSE}
|
|
source("assets/01_common.R")
|
|
load("assets/crps_learning/01_motivation_02.RData")
|
|
ggplot() +
|
|
geom_line(data = weights[weights$var != "1Optimum", ], size = 1.5, aes(x = prob, y = val, col = var)) +
|
|
geom_line(
|
|
data = weights[weights$var == "1Optimum", ], size = 1.5, aes(x = prob, y = val, col = var) # , linetype = "dashed"
|
|
) +
|
|
theme_minimal() +
|
|
theme(
|
|
text = element_text(size = text_size),
|
|
legend.position = "bottom",
|
|
legend.key.width = unit(1.5, "cm")
|
|
) +
|
|
xlab("Probability p") +
|
|
ylab("Weight w") +
|
|
scale_colour_manual(
|
|
NULL,
|
|
values = c("#969696", col_pointwise, col_p_constant, col_p_smooth),
|
|
labels = modnames[-c(3, 5)]
|
|
) +
|
|
guides(color = guide_legend(
|
|
ncol = 3,
|
|
byrow = FALSE,
|
|
title.hjust = 5,
|
|
# override.aes = list(
|
|
# linetype = c(rep("solid", 5), "dashed")
|
|
# )
|
|
)) +
|
|
ylim(c(0, 1))
|
|
```
|
|
|
|
::::
|
|
|
|
:::
|
|
|
|
:::
|
|
|
|
## The Smoothing Procedure
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
We are using penalized cubic b-splines:
|
|
|
|
Let $\varphi=(\varphi_1,\ldots, \varphi_L)$ be bounded basis functions on $(0,1)$ Then we approximate $w_{t,k}$ by
|
|
|
|
\begin{align}
|
|
w_{t,k}^{\text{smooth}} = \sum_{l=1}^L \beta_l \varphi_l = \beta'\varphi
|
|
\end{align}
|
|
|
|
with parameter vector $\beta$. The latter is estimated penalized $L_2$-smoothing which minimizes
|
|
|
|
\begin{equation}
|
|
\| w_{t,k} - \beta' \varphi \|^2_2 + \lambda \| \mathcal{D}^{d} (\beta' \varphi) \|^2_2
|
|
\label{eq_function_smooth}
|
|
\end{equation}
|
|
|
|
with differential operator $\mathcal{D}$
|
|
|
|
Smoothing can be applied ex-post or inside of the algorithm ( `r fontawesome::fa("arrow-right", fill ="#000000")` [Simulation](#simulation)).
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
We receive the constant solution for high values of $\lambda$ when setting $d=1$
|
|
|
|
<center>
|
|
<img src="assets/crps_learning/weights_lambda.gif">
|
|
</center>
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
|
|
# The Proposed CRPS-Learning Algorithm
|
|
|
|
---
|
|
|
|
## The Proposed CRPS-Learning Algorithm
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
**Initialization:**
|
|
|
|
Array of expert predicitons: $\widehat{X}_{t,k,p}$
|
|
|
|
Vector of Prediction targets: $Y_t$
|
|
|
|
Starting Weights: $w_0=(w_{0,1},\ldots, w_{0,K})$,
|
|
|
|
Penalization parameter: $\lambda\geq 0$
|
|
|
|
B-spline and penalty matrices $B$ and $D$ on $\mathcal{P}= (p_1,\ldots,p_M)$
|
|
|
|
Hat matrix: $$\mathcal{H} = B(B'B+ \lambda D'D)^{-1} B'$$
|
|
|
|
Cumulative Regret: $R_{0,k} = 0$
|
|
|
|
Range parameter: $E_{0,k}=0$
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
**Core**:
|
|
|
|
for(t in 1:T) { for(p in $\mathcal{P}$) {
|
|
|
|
$\widetilde{X}_{t,k}(p) = \sum_{k=1}^K w_{t-1,k,p} \widehat{X}_{t,k}(p)$ .grey[\# Prediction]
|
|
|
|
for(k in 1:K){
|
|
|
|
$r_{t,k,p} = \text{QL}_p^{\nabla}(\widehat{X}_{t,k}(p),Y_t) - \text{QL}_p^{\nabla}(\widetilde{X}_{t}(p),Y_t)$
|
|
|
|
$E_{t,k,p} = \max(E_{t-1,k,p}, |r_{t,k,p}|)$
|
|
|
|
$\eta_{t,k,p}=\min\left(1/2E_{t,k,p}, \sqrt{\log(K)/ \sum_{i=1}^t (r^2_{i, k,p})}\right)$
|
|
|
|
$R_{t,k,p} = R_{t-1,k,p} + \frac{1}{2} \left( r_{t,k,p} \left( 1+ \eta_{t,k,p} r_{t,k,p} \right) + 2E_{t,k,p} \mathbb{1}(\eta_{t,k,p}r_{t,k,p} > \frac{1}{2}) \right)$
|
|
|
|
$w_{t,k,p} = \eta_{t,k,p} \exp \left(- \eta_{t,k,p} R_{t,k,p} \right) w_{0,k,p} / \left( \frac{1}{K} \sum_{k = 1}^K \eta_{t,k,p} \exp \left( - \eta_{t,k,p} R_{t,k,p}\right) \right)$
|
|
|
|
}.grey[\#k]}.grey[\#p]
|
|
|
|
for(k in 1:K){
|
|
|
|
$w_{t,k} = \mathcal{H} w_{t,k}(\mathcal{P})$ .grey[\# Smoothing]
|
|
|
|
} .grey[\#k]} .grey[\#t]
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
## Simulation Study
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
Data Generating Process of the [simple probabilistic example](#simple_example)
|
|
|
|
- Constant solution $\lambda \rightarrow \infty$
|
|
- Pointwise Solution of the proposed BOAG
|
|
- Smoothed Solution of the proposed BOAG
|
|
- Weights are smoothed during learning
|
|
- Smooth weights are used to calculate Regret, adjust weights, etc.
|
|
- Smooth ex-post solution
|
|
- Weights are smoothed after the learning
|
|
- Algorithm always uses non-smoothed weights
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
::: {.panel-tabset}
|
|
|
|
## QL Deviation
|
|
|
|

|
|
|
|
## CRPS vs. Lambda
|
|
|
|
CRPS Values for different $\lambda$ (1000 runs)
|
|
|
|

|
|
|
|
::::
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
## Simulation Study
|
|
|
|
The same simulation carried out for different algorithms (1000 runs):
|
|
|
|
<center>
|
|
<img src="assets/crps_learning/algos_constant.gif">
|
|
</center>
|
|
|
|
## Simulation Study
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
**New DGP:**
|
|
|
|
\begin{align}
|
|
Y_t & \sim \mathcal{N}\left(\frac{\sin(0.005 \pi t )}{2},\,1\right) \\
|
|
\widehat{X}_{t,1} & \sim \widehat{F}_{1} = \mathcal{N}(-1,\,1) \\
|
|
\widehat{X}_{t,2} & \sim \widehat{F}_{2} = \mathcal{N}(3,\,4) \label{eq_dgp_sim2}
|
|
\end{align}
|
|
|
|
`r fontawesome::fa("arrow-right", fill ="#000000")` Changing optimal weights
|
|
|
|
`r fontawesome::fa("arrow-right", fill ="#000000")` Single run example depicted aside
|
|
|
|
`r fontawesome::fa("arrow-right", fill ="#000000")` No forgetting leads to long-term constant weights
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
**Weights of expert 2**
|
|
|
|
```{r, echo = FALSE, fig.width=7, fig.height=5, fig.align='center', cache = FALSE}
|
|
load("assets/crps_learning/changing_weights.rds")
|
|
mod_labs <- c("Optimum", "Pointwise", "Smooth", "Constant")
|
|
names(mod_labs) <- c("TOptimum", "Pointwise", "Smooth", "Constant")
|
|
colseq <- c(grey(.99), "orange", "red", "purple", "blue", "darkblue", "black")
|
|
weights_preprocessed %>%
|
|
mutate(w = 1 - w) %>%
|
|
ggplot(aes(t, p, fill = w)) +
|
|
geom_raster(interpolate = TRUE) +
|
|
facet_grid(Mod ~ ., labeller = labeller(Mod = mod_labs)) +
|
|
theme_minimal() +
|
|
theme(
|
|
# plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm"),
|
|
text = element_text(size = 15),
|
|
legend.key.height = unit(1, "inch")
|
|
) +
|
|
scale_x_continuous(expand = c(0, 0)) +
|
|
xlab("Time t") +
|
|
scale_fill_gradientn(
|
|
limits = c(0, 1),
|
|
colours = colseq,
|
|
breaks = seq(0, 1, 0.2)
|
|
) +
|
|
ylab("Weight w")
|
|
```
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
## Simulation Results
|
|
|
|
The simulation using the new DGP carried out for different algorithms (1000 runs):
|
|
|
|
<center>
|
|
<img src="assets/crps_learning/algos_changing.gif">
|
|
</center>
|
|
|
|
## Possible Extensions
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
**Forgetting**
|
|
|
|
- Only taking part of the old cumulative regret into account
|
|
- Exponential forgetting of past regret
|
|
|
|
\begin{align*}
|
|
R_{t,k} & = R_{t-1,k}(1-\xi) + \ell(\widetilde{F}_{t},Y_i) - \ell(\widehat{F}_{t,k},Y_i) \label{eq_regret_forget}
|
|
\end{align*}
|
|
|
|
**Fixed Shares** `r Citet(my_bib, "herbster1998tracking")`
|
|
|
|
- Adding fixed shares to the weights
|
|
- Shrinkage towards a constant solution
|
|
|
|
\begin{align*}
|
|
\widetilde{w}_{t,k} = \rho \frac{1}{K} + (1-\rho) w_{t,k}
|
|
\label{fixed_share_simple}.
|
|
\end{align*}
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
**Non-Equidistant Knots**
|
|
|
|
- Non-equidistant spline-basis could be used
|
|
- Potentially improves the tail-behavior
|
|
- Destroys shrinkage towards constant
|
|
|
|
<center>
|
|
<img src="assets/crps_learning/uneven_grid.gif">
|
|
</center>
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
## Application Study: Overview
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="29%"}
|
|
|
|
Data:
|
|
|
|
- Forecasting European emission allowances (EUA)
|
|
- Daily month-ahead prices
|
|
- Jan 13 - Dec 20 (Phase III, 2092 Obs)
|
|
|
|
Combination methods:
|
|
|
|
- Naive, BOAG, EWAG, ML-PolyG, BMA
|
|
|
|
Tuning paramter grids:
|
|
|
|
- Smoothing Penalty: $\Lambda= \{0\}\cup \{2^x|x\in \{-4,-3.5,\ldots,12\}\}$
|
|
- Learning Rates: $\mathcal{E}= \{2^x|x\in \{-1,-0.5,\ldots,9\}\}$
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="69%"}
|
|
|
|
```{r, echo = FALSE, fig.width=7, fig.height=5, fig.align='center', cache = FALSE}
|
|
load("assets/crps_learning/overview_data.rds")
|
|
|
|
data %>%
|
|
ggplot(aes(x = Date, y = value)) +
|
|
geom_line(size = 1, col = col_blue) +
|
|
theme_minimal() +
|
|
ylab("Value") +
|
|
facet_wrap(. ~ name, scales = "free", ncol = 1) +
|
|
theme(
|
|
text = element_text(size = 15),
|
|
strip.background = element_blank(),
|
|
strip.text.x = element_blank()
|
|
) -> p1
|
|
|
|
data %>%
|
|
ggplot(aes(x = value)) +
|
|
geom_histogram(aes(y = ..density..), size = 1, fill = col_blue, bins = 50) +
|
|
ylab("Density") +
|
|
xlab("Value") +
|
|
theme_minimal() +
|
|
theme(
|
|
strip.background = element_rect(fill = col_lightgray, colour = col_lightgray),
|
|
text = element_text(size = 15)
|
|
) +
|
|
facet_wrap(. ~ name, scales = "free", ncol = 1, strip.position = "right") -> p2
|
|
|
|
overview <- cowplot::plot_grid(plotlist = list(p1, p2), align = "hv", axis = "tblr", rel_widths = c(0.65, 0.35))
|
|
overview
|
|
```
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
## Application Study: Experts
|
|
|
|
Simple exponential smoothing with additive errors (**ETS-ANN**):
|
|
|
|
\begin{align*}
|
|
Y_{t} = l_{t-1} + \varepsilon_t \quad \text{with} \quad l_t = l_{t-1} + \alpha \varepsilon_t \quad \text{and} \quad \varepsilon_t \sim \mathcal{N}(0,\sigma^2)
|
|
\end{align*}
|
|
|
|
Quantile regression (**QuantReg**): For each $p \in \mathcal{P}$ we assume:
|
|
|
|
\begin{align*}
|
|
F^{-1}_{Y_t}(p) = \beta_{p,0} + \beta_{p,1} Y_{t-1} + \beta_{p,2} |Y_{t-1}-Y_{t-2}|
|
|
\end{align*}
|
|
|
|
ARIMA(1,0,1)-GARCH(1,1) with Gaussian errors (**ARMA-GARCH**):
|
|
|
|
\begin{align*}
|
|
Y_{t} = \mu + \phi(Y_{t-1}-\mu) + \theta \varepsilon_{t-1} + \varepsilon_t \quad \text{with} \quad \varepsilon_t = \sigma_t Z, \quad \sigma_t^2 = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2 \quad \text{and} \quad Z_t \sim \mathcal{N}(0,1)
|
|
\end{align*}
|
|
|
|
ARIMA(0,1,0)-I-EGARCH(1,1) with Gaussian errors (**I-EGARCH**):
|
|
|
|
\begin{align*}
|
|
Y_{t} = \mu + Y_{t-1} + \varepsilon_t \quad \text{with} \quad \varepsilon_t = \sigma_t Z, \quad \log(\sigma_t^2) = \omega + \alpha Z_{t-1}+ \gamma (|Z_{t-1}|-\mathbb{E}|Z_{t-1}|) + \beta \log(\sigma_{t-1}^2) \quad \text{and} \quad Z_t \sim \mathcal{N}(0,1)
|
|
\end{align*}
|
|
|
|
ARIMA(0,1,0)-GARCH(1,1) with student-t errors (**I-GARCHt**):
|
|
|
|
\begin{align*}
|
|
Y_{t} = \mu + Y_{t-1} + \varepsilon_t \quad \text{with} \quad \varepsilon_t = \sigma_t Z, \quad \sigma_t^2 = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2 \quad \text{and} \quad Z_t \sim t(0,1, \nu)
|
|
\end{align*}
|
|
|
|
|
|
## Results
|
|
|
|
::: {.panel-tabset}
|
|
|
|
## Significance
|
|
|
|
```{r, echo = FALSE, fig.width=7, fig.height=5.5, fig.align='center', cache = FALSE, results='asis'}
|
|
load("assets/crps_learning/bernstein_application_study_estimations+learnings_rev1.RData")
|
|
|
|
quantile_loss <- function(X, y, tau) {
|
|
t(t(y - X) * tau) * (y - X > 0) + t(t(X - y) * (1 - tau)) * (y - X < 0)
|
|
}
|
|
QL <- FCSTN * NA
|
|
for (k in 1:dim(QL)[1]) {
|
|
QL[k, , ] <- quantile_loss(FCSTN[k, , ], as.numeric(yoos), Qgrid)
|
|
}
|
|
|
|
## TABLE AREA
|
|
|
|
KK <- length(mnames)
|
|
TTinit <- 1 ## without first, as all comb. are uniform
|
|
RQL <- apply(QL[1:KK, -c(1:TTinit), ], c(1, 3), mean)
|
|
dimnames(RQL) <- list(mnames, Qgrid)
|
|
RQLm <- apply(RQL, c(1), mean, na.rm = TRUE)
|
|
# sort(RQLm - RQLm[K + 1])
|
|
##
|
|
qq <- apply(QL[1:KK, -c(1:TTinit), ], c(1, 2), mean)
|
|
# t.test(qq[K + 1, ] - qq[K + 3, ])
|
|
# t.test(qq[K + 1, ] - qq[K + 4, ])
|
|
|
|
|
|
library(xtable)
|
|
Pall <- numeric(KK)
|
|
for (i in 1:KK) Pall[i] <- t.test(qq[K + 1, ] - qq[i, ], alternative = "greater")$p.val
|
|
|
|
Mall <- (RQLm - RQLm[K + 1]) * 10000
|
|
Mout <- matrix(Mall[-c(1:(K + 3))], 5, 6)
|
|
dimnames(Mout) <- list(moname, mtname)
|
|
|
|
Pallout <- format(round(Pall, 3), nsmall = 3)
|
|
Pallout[Pallout == "0.000"] <- "<.001"
|
|
Pallout[Pallout == "1.000"] <- ">.999"
|
|
|
|
MO <- K
|
|
IDX <- c(1:K)
|
|
OUT <- t(Mall[IDX])
|
|
OUT.num <- OUT
|
|
class(OUT.num) <- "numeric"
|
|
|
|
xxx <- OUT.num
|
|
xxxx <- OUT
|
|
table <- OUT
|
|
table_col <- OUT
|
|
i.p <- 1
|
|
for (i.p in 1:MO) {
|
|
xmax <- -min(Mall) * 5 # max(Mall)
|
|
xmin <- min(Mall)
|
|
cred <- rev(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .8, .5)) # , .5,0,0,0,1,1,1) ## red
|
|
cgreen <- rev(c(.5, .5, .55, .6, .65, .7, .75, .8, .85, .9, .95, 1, 1, .9)) # , .5,0,1,1,1,0,0) ## green
|
|
cblue <- rev(c(.55, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5)) # , .5,1,1,0,0,0,1) ## blue
|
|
crange <- c(xmin, xmax) ## range
|
|
## colors in plot:
|
|
fred <- round(approxfun(seq(crange[1], crange[2], length = length(cred)), cred)(pmin(xxx[, i.p], xmax)), 3)
|
|
fgreen <- round(approxfun(seq(crange[1], crange[2], length = length(cgreen)), cgreen)(pmin(xxx[, i.p], xmax)), 3)
|
|
fblue <- round(approxfun(seq(crange[1], crange[2], length = length(cblue)), cblue)(pmin(xxx[, i.p], xmax)), 3)
|
|
tmp <- format(round(xxx[, i.p], 3), nsmall = 3)
|
|
xxxx[, i.p] <- paste("\\cellcolor[rgb]{", fred, ",", fgreen, ",", fblue, "}", tmp, " {\\footnotesize (", Pallout[IDX[i.p]], ")}", sep = "")
|
|
table[, i.p] <- paste0(tmp, " (", Pallout[i.p], ")")
|
|
table_col[, i.p] <- rgb(fred, fgreen, fblue, maxColorValue = 1)
|
|
} # i.p
|
|
|
|
table_out <- kbl(table, align = rep("c", ncol(table)))
|
|
|
|
for (cols in 1:ncol(table)) {
|
|
table_out <- table_out %>%
|
|
column_spec(cols, background = table_col[, cols])
|
|
}
|
|
table_out %>%
|
|
kable_material()
|
|
```
|
|
|
|
```{r, echo = FALSE, fig.width=7, fig.height=5.5, fig.align='center', cache = FALSE, results='asis'}
|
|
MO <- 6
|
|
OUT <- Mout
|
|
OUT.num <- OUT
|
|
class(OUT.num) <- "numeric"
|
|
|
|
xxx <- OUT.num
|
|
xxxx <- OUT
|
|
i.p <- 1
|
|
table2 <- OUT
|
|
table_col2 <- OUT
|
|
for (i.p in 1:MO) {
|
|
xmax <- -min(Mall) * 5 # max(Mall)
|
|
xmin <- min(Mall)
|
|
cred <- rev(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .8, .5)) # , .5,0,0,0,1,1,1) ## red
|
|
cgreen <- rev(c(.5, .5, .55, .6, .65, .7, .75, .8, .85, .9, .95, 1, 1, .9)) # , .5,0,1,1,1,0,0) ## green
|
|
cblue <- rev(c(.55, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5)) # , .5,1,1,0,0,0,1) ## blue
|
|
crange <- c(xmin, xmax) ## range
|
|
## colors in plot:
|
|
fred <- round(approxfun(seq(crange[1], crange[2], length = length(cred)), cred)(pmin(xxx[, i.p], xmax)), 3)
|
|
fgreen <- round(approxfun(seq(crange[1], crange[2], length = length(cgreen)), cgreen)(pmin(xxx[, i.p], xmax)), 3)
|
|
fblue <- round(approxfun(seq(crange[1], crange[2], length = length(cblue)), cblue)(pmin(xxx[, i.p], xmax)), 3)
|
|
tmp <- format(round(xxx[, i.p], 3), nsmall = 3)
|
|
xxxx[, i.p] <- paste("\\cellcolor[rgb]{", fred, ",", fgreen, ",", fblue, "}", tmp, " {\\footnotesize (", Pallout[K + 3 + 5 * (i.p - 1) + 1:5], ")}", sep = "")
|
|
table2[, i.p] <- paste0(tmp, " (", Pallout[K + 3 + 5 * (i.p - 1) + 1:5], ")")
|
|
table_col2[, i.p] <- rgb(fred, fgreen, fblue, maxColorValue = 1)
|
|
} # i.p
|
|
|
|
table_out2 <- kableExtra::kbl(table2, align = rep("c", ncol(table2)))
|
|
|
|
for (cols in 1:ncol(table2)) {
|
|
table_out2 <- table_out2 %>%
|
|
column_spec(1 + cols,
|
|
background = table_col2[, cols]
|
|
)
|
|
}
|
|
|
|
table_out2 %>%
|
|
kable_material() %>%
|
|
column_spec(1, bold = T)
|
|
```
|
|
|
|
## QL
|
|
|
|
```{r, echo = FALSE, fig.width=13, fig.height=5.5, fig.align='center', cache = FALSE}
|
|
|
|
##### Performance across probabilities
|
|
M <- length(mnames)
|
|
Msel <- c(1:K, K + 1, K + 1 + 2 + 1:4 * 5 - 2) ## experts + naive + smooth
|
|
modnames <- mnames[Msel]
|
|
|
|
tCOL <- c(
|
|
"#E6CC00", "#CC6600", "#E61A1A", "#99004D", "#F233BF",
|
|
"#666666", "#0000CC", "#1A80E6", "#1AE680", "#00CC00"
|
|
)
|
|
|
|
|
|
t(RQL) %>%
|
|
as_tibble() %>%
|
|
select(Naive) %>%
|
|
mutate(Naive = 0) %>%
|
|
mutate(p = 1:99 / 100) %>%
|
|
pivot_longer(-p, values_to = "Loss differences") -> dummy
|
|
|
|
t(RQL) %>%
|
|
as_tibble() %>%
|
|
select(mnames[Msel]) %>%
|
|
mutate(p = 1:99 / 100) %>%
|
|
pivot_longer(!p & !Naive) %>%
|
|
mutate(`Loss differences` = value - Naive) %>%
|
|
select(-value, -Naive) %>%
|
|
rbind(dummy) %>%
|
|
mutate(
|
|
p = as.numeric(p),
|
|
name = stringr::str_replace(name, "-P-smooth", ""),
|
|
name = factor(name, levels = stringr::str_replace(mnames[Msel], "-P-smooth", ""), ordered = T),
|
|
`Loss differences` = `Loss differences` * 1000
|
|
) %>%
|
|
ggplot(aes(x = p, y = `Loss differences`, colour = name)) +
|
|
geom_line(linewidth = 1) +
|
|
theme_minimal() +
|
|
theme(
|
|
text = element_text(size = text_size),
|
|
legend.position = "bottom"
|
|
) +
|
|
xlab("Probability p") +
|
|
scale_color_manual(NULL, values = tCOL) +
|
|
guides(colour = guide_legend(nrow = 2, byrow = TRUE))
|
|
```
|
|
|
|
## Cumulative Loss Difference
|
|
|
|
```{r, echo = FALSE, fig.width=13, fig.height=5.5, fig.align='center', cache = FALSE}
|
|
DQL <- t(apply(apply(QL[1:KK, -c(1:TTinit), ], c(1, 2), mean), 1, cumsum))
|
|
|
|
rownames(DQL) <- mnames
|
|
|
|
t(DQL) %>%
|
|
as_tibble() %>%
|
|
select(Naive) %>%
|
|
mutate(
|
|
`Difference of cumulative loss` = 0,
|
|
Date = ytime[-c(1:(TT + TTinit + 1))],
|
|
name = "Naive"
|
|
) %>%
|
|
select(-Naive) -> dummy
|
|
|
|
|
|
data <- t(DQL) %>%
|
|
as_tibble() %>%
|
|
select(mnames[Msel]) %>%
|
|
mutate(Date = ytime[-c(1:(TT + TTinit + 1))]) %>%
|
|
pivot_longer(!Date & !Naive) %>%
|
|
mutate(`Difference of cumulative loss` = value - Naive) %>%
|
|
select(-value, -Naive) %>%
|
|
rbind(dummy) %>%
|
|
mutate(
|
|
name = stringr::str_replace(name, "-P-smooth", ""),
|
|
name = factor(name, levels = stringr::str_replace(mnames[Msel], "-P-smooth", ""))
|
|
)
|
|
|
|
data %>%
|
|
ggplot(aes(x = Date, y = `Difference of cumulative loss`, colour = name)) +
|
|
geom_line(size = 1) +
|
|
theme_minimal() +
|
|
theme(
|
|
text = element_text(size = text_size),
|
|
legend.position = "bottom"
|
|
) +
|
|
scale_color_manual(NULL, values = tCOL) +
|
|
guides(colour = guide_legend(nrow = 2, byrow = TRUE))
|
|
```
|
|
|
|
## Weights (BOAG P-Smooth)
|
|
|
|
```{r, echo = FALSE, fig.width=13, fig.height=5.5, fig.align='center', cache = FALSE}
|
|
load("assets/crps_learning/weights_data.RData")
|
|
weights_data %>%
|
|
ggplot(aes(Date, p, fill = w)) +
|
|
geom_raster(interpolate = TRUE) +
|
|
facet_grid(Mod ~ .) +
|
|
theme_minimal() +
|
|
theme(
|
|
plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"),
|
|
text = element_text(size = text_size),
|
|
legend.key.height = unit(0.9, "inch")
|
|
) +
|
|
ylab("p") +
|
|
scale_fill_gradientn(
|
|
limits = c(0, 1),
|
|
colours = colseq,
|
|
breaks = seq(0, 1, 0.2)
|
|
) +
|
|
scale_x_date(expand = c(0, 0))
|
|
```
|
|
|
|
## Weights (Last)
|
|
|
|
```{r, echo = FALSE, fig.width=13, fig.height=5.5, fig.align='center', cache = FALSE}
|
|
load("assets/crps_learning/weights_example.RData")
|
|
weights %>%
|
|
ggplot(aes(x = p, y = weights, col = Model)) +
|
|
geom_line(size = 1.5) +
|
|
theme_minimal() +
|
|
theme(
|
|
plot.margin = unit(c(0.2, 0.3, 0.2, 0.2), "cm"),
|
|
text = element_text(size = text_size),
|
|
legend.position = "bottom",
|
|
legend.title = element_blank(),
|
|
panel.spacing = unit(1.5, "lines")
|
|
) +
|
|
scale_color_manual(NULL, values = tCOL[1:K]) +
|
|
facet_grid(. ~ K)
|
|
```
|
|
|
|
::::
|
|
|
|
## Wrap-Up
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
Potential Downsides:
|
|
|
|
- Pointwise optimization can induce quantile crossing
|
|
- Can be solved by sorting the predictions
|
|
|
|
Upsides:
|
|
|
|
- Pointwise learning outperforms the Naive solution significantly
|
|
- Online learning is much faster than batch methods
|
|
- Smoothing further improves the predictive performance
|
|
- Asymptotically not worse than the best convex combination
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
Important:
|
|
|
|
- The choice of the learning rate is crucial
|
|
- The loss function has to meet certain criteria
|
|
|
|
The [`r fontawesome::fa("github")` profoc](https://profoc.berrisch.biz/) R Package:
|
|
|
|
- Implements all algorithms discussed above
|
|
- Is written using RcppArmadillo `r fontawesome::fa("arrow-right", fill ="#000000")` its fast
|
|
- Accepts vectors for most parameters
|
|
- The best parameter combination is chosen online
|
|
- Implements
|
|
- Forgetting, Fixed Share
|
|
- Different loss functions + gradients
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
<!-- :::: {.notes}
|
|
|
|
Execution Times:
|
|
|
|
T = 5000
|
|
|
|
Opera:
|
|
|
|
Ml-Poly > 157 ms
|
|
Boa > 212 ms
|
|
|
|
Profoc:
|
|
|
|
Ml-Poly > 17
|
|
BOA > 16 -->
|
|
|
|
# Multivariate Probabilistic CRPS Learning with an Application to Day-Ahead Electricity Prices
|
|
|
|
---
|
|
|
|
## Outline
|
|
|
|
```{r, include=FALSE}
|
|
col_lightgray <- "#e7e7e7"
|
|
col_blue <- "#000088"
|
|
col_smooth_expost <- "#a7008b"
|
|
col_smooth <- "#187a00"
|
|
col_pointwise <- "#008790"
|
|
col_constant <- "#dd9002"
|
|
col_optimum <- "#666666"
|
|
col_green <- "#61B94C"
|
|
col_orange <- "#ffa600"
|
|
col_yellow <- "#FCE135"
|
|
```
|
|
|
|
</br>
|
|
|
|
**Multivariate CRPS Learning**
|
|
|
|
- Introduction
|
|
- Smoothing procedures
|
|
- Application to multivariate electricity price forecasts
|
|
|
|
**The `profoc` R package**
|
|
|
|
- Package overview
|
|
- Implementation details
|
|
- Illustrative examples
|
|
|
|
## The Framework of Prediction under Expert Advice
|
|
|
|
### The sequential framework
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
Each day, $t = 1, 2, ... T$
|
|
|
|
- The **forecaster** receives predictions $\widehat{X}_{t,k}$ from $K$ **experts**
|
|
- The **forecaster** assings weights $w_{t,k}$ to each **expert**
|
|
- The **forecaster** calculates her prediction:
|
|
|
|
$$\widetilde{X}_{t}=\sum_{k=1}^K w_{t,k}\widehat{X}_{t,k}$$
|
|
|
|
- The realization for $t$ is observed
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
- The experts can be institutions, persons, or models
|
|
- The forecasts can be point-forecasts (i.e., mean or median) or full predictive distributions
|
|
- We do not need any assumptions concerning the underlying data
|
|
- `r Citet(my_bib, "cesa2006prediction")`
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
## The Regret
|
|
|
|
Weights are updated sequentially according to the past performance of the $K$ experts.
|
|
|
|
`r fontawesome::fa("arrow-right", fill ="#000000")` A loss function $\ell$ is needed (to compute the **cumulative regret** $R_{t,k}$)
|
|
|
|
\begin{equation}
|
|
R_{t,k} = \widetilde{L}_{t} - \widehat{L}_{t,k} = \sum_{i = 1}^t \ell(\widetilde{X}_{i},Y_i) - \ell(\widehat{X}_{i,k},Y_i)
|
|
\label{eq_regret}
|
|
\end{equation}
|
|
|
|
The cumulative regret:
|
|
- Indicates the predictive accuracy of expert $k$ until time $t$.
|
|
- Measures how much the forecaster *regrets* not having followed the expert's advice
|
|
|
|
Popular loss functions for point forecasting `r Citet(my_bib, "gneiting2011making")`:
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
- $\ell_2$-loss $\ell_2(x, y) = | x -y|^2$
|
|
- optimal for mean prediction
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
- $\ell_1$-loss $\ell_1(x, y) = | x -y|$
|
|
- optimal for median predictions
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
---
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
### Probabilistic Setting
|
|
|
|
An appropriate loss:
|
|
|
|
\begin{align*}
|
|
\text{CRPS}(F, y) & = \int_{\mathbb{R}} {(F(x) - \mathbb{1}\{ x > y \})}^2 dx
|
|
\label{eq_crps}
|
|
\end{align*}
|
|
|
|
It's strictly proper `r Citet(my_bib, "gneiting2007strictly")`.
|
|
|
|
Using the CRPS, we can calculate time-adaptive weights $w_{t,k}$. However, what if the experts' performance varies in parts of the distribution?
|
|
|
|
`r fontawesome::fa("lightbulb", fill = col_yellow)` Utilize this relation:
|
|
|
|
\begin{align*}
|
|
\text{CRPS}(F, y) = 2 \int_0^{1} \text{QL}_p(F^{-1}(p), y) \, d p.
|
|
\label{eq_crps_qs}
|
|
\end{align*}
|
|
|
|
... to combine quantiles of the probabilistic forecasts individually using the quantile-loss QL.
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
### Optimal Convergence
|
|
|
|
</br>
|
|
|
|
`r fontawesome::fa("exclamation", fill = col_orange)` exp-concavity of the loss is required for *selection* and *convex aggregation* properties
|
|
|
|
`r fontawesome::fa("exclamation", fill = col_orange)` QL is convex, but not exp-concave
|
|
|
|
`r fontawesome::fa("arrow-right", fill ="#000000")` The Bernstein Online Aggregation (BOA) lets us weaken the exp-concavity condition.
|
|
|
|
Convergence rates of BOA are:
|
|
|
|
`r fontawesome::fa("arrow-right", fill ="#000000")` Almost optimal w.r.t *selection* `r Citet(my_bib, "gaillard2018efficient")`.
|
|
|
|
`r fontawesome::fa("arrow-right", fill ="#000000")` Almost optimal w.r.t *convex aggregation* `r Citet(my_bib, "wintenberger2017optimal")`.
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
## Multivariate CRPS Learning
|
|
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
Additionally, we extend the **B-Smooth** and **P-Smooth** procedures to the multivariate setting:
|
|
|
|
- Basis matrices for reducing
|
|
- - the probabilistic dimension from $P$ to $\widetilde P$
|
|
- - the multivariate dimension from $D$ to $\widetilde D$
|
|
|
|
|
|
- Hat matrices
|
|
- - penalized smoothing across P and D dimensions
|
|
|
|
We utilize the mean Pinball Score over the entire space for hyperparameter optimization (e.g, $\lambda$)
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
*Basis Smoothing*
|
|
|
|
Represent weights as linear combinations of bounded basis functions:
|
|
|
|
\begin{equation}
|
|
\underbrace{\boldsymbol w_{t,k}}_{D \text{ x } P} = \sum_{j=1}^{\widetilde D} \sum_{l=1}^{\widetilde P} \beta_{t,j,l,k} \varphi^{\text{mv}}_{j} \varphi^{\text{pr}}_{l} = \underbrace{\boldsymbol \varphi^{\text{mv}}}_{D\text{ x }\widetilde D} \boldsymbol \beta_{t,k} \underbrace{{\boldsymbol\varphi^{\text{pr}}}'}_{\widetilde P \text{ x }P} \nonumber
|
|
\end{equation}
|
|
|
|
A popular choice: B-Splines
|
|
|
|
$\boldsymbol \beta_{t,k}$ is calculated using a reduced regret matrix:
|
|
|
|
$\underbrace{\boldsymbol r_{t,k}}_{\widetilde P \times \widetilde D} = \boldsymbol \varphi^{\text{pr}} \underbrace{\left({\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widetilde{\boldsymbol X}_{t},Y_t)- {\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widehat{\boldsymbol X}_{t},Y_t)\right)}_{\text{PxD}}\boldsymbol \varphi^{\text{mv}}$
|
|
|
|
If $\widetilde P = P$ it holds that $\boldsymbol \varphi^{pr} = \boldsymbol{I}$ (pointwise)
|
|
|
|
For $\widetilde P = 1$ we receive constant weights
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
## Multivariate CRPS Learning
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
**Penalized smoothing:**
|
|
|
|
Let $\boldsymbol{\psi}^{\text{mv}}=(\psi_1,\ldots, \psi_{D})$ and $\boldsymbol{\psi}^{\text{pr}}=(\psi_1,\ldots, \psi_{P})$ be two sets of bounded basis functions on $(0,1)$:
|
|
|
|
\begin{equation}
|
|
\boldsymbol w_{t,k} = \boldsymbol{\psi}^{\text{mv}} \boldsymbol{b}_{t,k} {\boldsymbol{\psi}^{pr}}'
|
|
\end{equation}
|
|
|
|
with parameter matix $\boldsymbol b_{t,k}$. The latter is estimated to penalize $L_2$-smoothing which minimizes
|
|
|
|
\begin{align}
|
|
& \| \boldsymbol{\beta}_{t,d, k}' \boldsymbol{\varphi}^{\text{pr}} - \boldsymbol b_{t, d, k}' \boldsymbol{\psi}^{\text{pr}} \|^2_2 + \lambda^{\text{pr}} \| \mathcal{D}_{q} (\boldsymbol b_{t, d, k}' \boldsymbol{\psi}^{\text{pr}}) \|^2_2 + \nonumber \\
|
|
& \| \boldsymbol{\beta}_{t, p, k}' \boldsymbol{\varphi}^{\text{mv}} - \boldsymbol b_{t, p, k}' \boldsymbol{\psi}^{\text{mv}} \|^2_2 + \lambda^{\text{mv}} \| \mathcal{D}_{q} (\boldsymbol b_{t, p, k}' \boldsymbol{\psi}^{\text{mv}}) \|^2_2 \nonumber
|
|
\end{align}
|
|
|
|
with differential operator $\mathcal{D}_q$ of order $q$
|
|
|
|
Computation is easy since we have an analytical solution.
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
```{r, fig.align="center", echo=FALSE, out.width = "1000px"}
|
|
knitr::include_graphics("assets/mcrps_learning/algorithm.svg")
|
|
```
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
## Application
|
|
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
#### Data
|
|
|
|
- Day-Ahead electricity price forecasts from `r Citet(my_bib, "marcjasz2022distributional")`
|
|
- Produced using probabilistic neural networks
|
|
- 24-dimensional distributional forecasts
|
|
- Distribution assumptions: JSU and Normal
|
|
- 8 experts (4 JSU, 4 Normal)
|
|
- 27th Dec. 2018 to 31st Dec. 2020 (736 days)
|
|
- We extract 99 quantiles (percentiles)
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
#### Setup
|
|
|
|
Evaluation: Exclude first 182 observations
|
|
|
|
Extensions: Penalized smoothing | Forgetting
|
|
|
|
Tuning strategies:
|
|
|
|
- Bayesian Fix
|
|
- Sophisticated Baesian Search algorithm
|
|
- Online
|
|
- Dynamic based on past performance
|
|
- Bayesian Online
|
|
- First Bayesian Fix then Online
|
|
|
|
Computation Time: ~30 Minutes
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
# Special Cases
|
|
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
::: {.panel-tabset}
|
|
|
|
## Constant
|
|
|
|
```{r, fig.align="center", echo=FALSE, out.width = "400"}
|
|
knitr::include_graphics("assets/mcrps_learning/constant.svg")
|
|
```
|
|
|
|
## Constant PR
|
|
|
|
```{r, fig.align="center", echo=FALSE, out.width = "400"}
|
|
knitr::include_graphics("assets/mcrps_learning/constant_pr.svg")
|
|
```
|
|
|
|
## Constant MV
|
|
|
|
```{r, fig.align="center", echo=FALSE, out.width = "400"}
|
|
knitr::include_graphics("assets/mcrps_learning/constant_mv.svg")
|
|
```
|
|
|
|
::::
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
::: {.panel-tabset}
|
|
|
|
## Pointwise
|
|
|
|
```{r, fig.align="center", echo=FALSE, out.width = "400"}
|
|
knitr::include_graphics("assets/mcrps_learning/pointwise.svg")
|
|
```
|
|
|
|
## Smooth
|
|
|
|
```{r, fig.align="center", echo=FALSE, out.width = "400"}
|
|
knitr::include_graphics("assets/mcrps_learning/smooth_best.svg")
|
|
```
|
|
|
|
::::
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
## Results
|
|
|
|
```{r, fig.align="center", echo=FALSE, out.width = "400"}
|
|
knitr::include_graphics("assets/mcrps_learning/tab_performance_sa.svg")
|
|
```
|
|
|
|
## Results
|
|
|
|
```{r, warning=FALSE, fig.align="center", echo=FALSE, fig.width=12, fig.height=6}
|
|
load("assets/mcrps_learning/pars_data.rds")
|
|
pars_data %>%
|
|
ggplot(aes(x = dates, y = value)) +
|
|
geom_rect(aes(
|
|
ymin = 0,
|
|
ymax = value * 1.2,
|
|
xmin = dates[1],
|
|
xmax = dates[182],
|
|
fill = "Burn-In"
|
|
)) +
|
|
geom_line(aes(color = name), linewidth = linesize, show.legend = FALSE) +
|
|
scale_colour_manual(
|
|
values = as.character(cols[5, c("pink", "amber", "green")])
|
|
) +
|
|
facet_grid(name ~ .,
|
|
scales = "free_y",
|
|
# switch = "both"
|
|
) +
|
|
scale_y_continuous(
|
|
trans = "log2",
|
|
labels = scaleFUN
|
|
) +
|
|
theme_minimal() +
|
|
theme(
|
|
# plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"),
|
|
text = element_text(size = text_size),
|
|
legend.key.width = unit(0.9, "inch"),
|
|
legend.position = "none"
|
|
) +
|
|
ylab(NULL) +
|
|
xlab("date") +
|
|
scale_fill_manual(NULL,
|
|
values = as.character(cols[3, "grey"])
|
|
)
|
|
```
|
|
|
|
## Results: Hour 16:00-17:00
|
|
|
|
```{r, fig.align="center", echo=FALSE, fig.width=12, fig.height=6}
|
|
load("assets/mcrps_learning/weights_h.rds")
|
|
weights_h %>%
|
|
ggplot(aes(date, q, fill = weight)) +
|
|
geom_raster(interpolate = TRUE) +
|
|
facet_grid(
|
|
Expert ~ . # , labeller = labeller(Mod = mod_labs)
|
|
) +
|
|
theme_minimal() +
|
|
theme(
|
|
# plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"),
|
|
text = element_text(size = text_size),
|
|
legend.key.height = unit(0.9, "inch")
|
|
) +
|
|
scale_x_date(expand = c(0, 0)) +
|
|
scale_fill_gradientn(
|
|
oob = scales::squish,
|
|
limits = c(0, 1),
|
|
values = c(seq(0, 0.4, length.out = 8), 0.65, 1),
|
|
colours = c(
|
|
cols[8, "red"],
|
|
cols[5, "deep-orange"],
|
|
cols[5, "amber"],
|
|
cols[5, "yellow"],
|
|
cols[5, "lime"],
|
|
cols[5, "light-green"],
|
|
cols[5, "green"],
|
|
cols[7, "green"],
|
|
cols[9, "green"],
|
|
cols[10, "green"]
|
|
),
|
|
breaks = seq(0, 1, 0.1)
|
|
) +
|
|
xlab("date") +
|
|
ylab("probability") +
|
|
scale_y_continuous(breaks = c(0.1, 0.5, 0.9))
|
|
```
|
|
|
|
## Results: Median
|
|
|
|
```{r, fig.align="center", echo=FALSE, fig.width=12, fig.height=6}
|
|
load("assets/mcrps_learning/weights_q.rds")
|
|
weights_q %>%
|
|
mutate(hour = as.numeric(hour) - 1) %>%
|
|
ggplot(aes(date, hour, fill = weight)) +
|
|
geom_raster(interpolate = TRUE) +
|
|
facet_grid(
|
|
Expert ~ . # , labeller = labeller(Mod = mod_labs)
|
|
) +
|
|
theme_minimal() +
|
|
theme(
|
|
# plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"),
|
|
text = element_text(size = text_size),
|
|
legend.key.height = unit(0.9, "inch")
|
|
) +
|
|
scale_x_date(expand = c(0, 0)) +
|
|
scale_fill_gradientn(
|
|
oob = scales::squish,
|
|
limits = c(0, 1),
|
|
values = c(seq(0, 0.4, length.out = 8), 0.65, 1),
|
|
colours = c(
|
|
cols[8, "red"],
|
|
cols[5, "deep-orange"],
|
|
cols[5, "amber"],
|
|
cols[5, "yellow"],
|
|
cols[5, "lime"],
|
|
cols[5, "light-green"],
|
|
cols[5, "green"],
|
|
cols[7, "green"],
|
|
cols[9, "green"],
|
|
cols[10, "green"]
|
|
),
|
|
breaks = seq(0, 1, 0.1)
|
|
) +
|
|
xlab("date") +
|
|
ylab("hour") +
|
|
scale_y_continuous(breaks = c(0, 8, 16, 24))
|
|
```
|
|
|
|
## Profoc R Package
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
### Probabilistic Forecast Combination - profoc
|
|
|
|
Available on [Github](https://github.com/BerriJ/profoc) and [CRAN](https://CRAN.R-project.org/package=profoc)
|
|
|
|
Main Function: `online()` for online learning.
|
|
- Works with multivariate and/or probabilistic data
|
|
- Implements BOA, ML-POLY, EWA (and the gradient versions)
|
|
- Implements many extensions like smoothing, forgetting, thresholding, etc.
|
|
- Various loss functions are available
|
|
- Various methods (`predict`, `update`, `plot`, etc.)
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
### Speed
|
|
|
|
Large parts of profoc are implemented in C++.
|
|
|
|
<center>
|
|
<img src="assets/mcrps_learning/profoc_langs.png">
|
|
</center>
|
|
|
|
We use `Rcpp`, `RcppArmadillo`, and OpenMP.
|
|
|
|
We use `Rcpp` modules to expose a class to R
|
|
- Offers great flexibility for the end-user
|
|
- Requires very little knowledge of C++ code
|
|
- High-Level interface is easy to use
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
## Profoc - B-Spline Basis
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
Basis specification `b_smooth_pr` is internally passed to `make_basis_mats()`:
|
|
|
|
```{r, echo = TRUE, eval = FALSE, cache = FALSE}
|
|
mod <- online(
|
|
y = Y,
|
|
experts = experts,
|
|
tau = 1:99 / 100,
|
|
b_smooth_pr = list(
|
|
knots = 9,
|
|
mu = 0.3, # NEW
|
|
sigma = 1,
|
|
nonc = 0,
|
|
tailweight = 1,
|
|
deg = 3
|
|
)
|
|
)
|
|
```
|
|
|
|
Knots are distributed using the generalized beta distribution.
|
|
|
|
TODO: Add actual algorithm to backup slides
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
TODO
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
## Wrap-Up
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
The [`r fontawesome::fa("github")` profoc](https://profoc.berrisch.biz/) R Package:
|
|
|
|
Profoc is a flexible framework for online learning.
|
|
|
|
- It implements several algorithms
|
|
- It implements several loss functions
|
|
- It implements several extensions
|
|
- Its high- and low-level interfaces offer great flexibility
|
|
|
|
Profoc is fast.
|
|
|
|
- The core components are written in C++
|
|
- The core components utilize OpenMP for parallelization
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
Multivariate Extension:
|
|
|
|
- Code is available now
|
|
- [Pre-Print](https://arxiv.org/abs/2303.10019) is available now
|
|
|
|
Get these slides:
|
|
|
|
<center>
|
|
<img src="assets/mcrps_learning/web_pres.png">
|
|
</center>
|
|
[https://berrisch.biz/slides/23_06_ecmi/](https://berrisch.biz/slides/23_06_ecmi/)
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
|
|
## Columns Template
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
Baz
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
foo
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
## Paneltabset Template
|
|
|
|
::: {.panel-tabset}
|
|
|
|
## Baz
|
|
|
|
Bar
|
|
|
|
## Bam
|
|
|
|
Foo
|
|
|
|
::::
|
|
|
|
# References
|
|
|
|
```{r refs1, echo=FALSE, results="asis"}
|
|
PrintBibliography(my_bib, .opts = list(style = "text"))
|
|
```
|
|
|