741 lines
19 KiB
Plaintext
741 lines
19 KiB
Plaintext
---
|
|
title: "Data Science Methods for Forecasting in Energy and Economics"
|
|
date: 2025-07-10
|
|
author:
|
|
- name: Jonathan Berrisch
|
|
affiliations:
|
|
- ref: hemf
|
|
affiliations:
|
|
- id: hemf
|
|
name: University of Duisburg-Essen, House of Energy Markets and Finance
|
|
format:
|
|
revealjs:
|
|
embed-resources: true
|
|
footer: ""
|
|
logo: logos_combined.png
|
|
theme: [default, clean.scss]
|
|
smaller: true
|
|
fig-format: svg
|
|
execute:
|
|
daemon: false
|
|
highlight-style: github
|
|
---
|
|
|
|
## Outline
|
|
|
|
::: {.hidden}
|
|
$$
|
|
\newcommand{\A}{{\mathbb A}}
|
|
$$
|
|
:::
|
|
|
|
<br>
|
|
|
|
::: {style="font-size: 150%;"}
|
|
|
|
[{{< fa bars-staggered >}}]{style="color: #404040;"}   Introduction & Research Motivation
|
|
|
|
[{{< fa bars-staggered >}}]{style="color: #404040;"}   Overview of the Thesis
|
|
|
|
[{{< fa table >}}]{style="color: #404040;"}   Online Learning
|
|
|
|
[{{< fa circle-nodes >}}]{style="color: #404040;"}   Probabilistic Forecasting of European Carbon and Energy Prices
|
|
|
|
[{{< fa lightbulb >}}]{style="color: #404040;"}   Limitations
|
|
|
|
[{{< fa binoculars >}}]{style="color: #404040;"}   Contributions & Outlook
|
|
|
|
:::
|
|
|
|
## PHD DeFence
|
|
|
|
```{r, setup, include=FALSE}
|
|
# Compile with: rmarkdown::render("crps_learning.Rmd")
|
|
library(latex2exp)
|
|
library(ggplot2)
|
|
library(dplyr)
|
|
library(tidyr)
|
|
library(purrr)
|
|
library(kableExtra)
|
|
knitr::opts_chunk$set(
|
|
dev = "svglite" # Use svg figures
|
|
)
|
|
library(RefManageR)
|
|
BibOptions(
|
|
check.entries = TRUE,
|
|
bib.style = "authoryear",
|
|
cite.style = "authoryear",
|
|
style = "html",
|
|
hyperlink = TRUE,
|
|
dashed = FALSE
|
|
)
|
|
my_bib <- ReadBib("assets/library.bib", check = FALSE)
|
|
col_lightgray <- "#e7e7e7"
|
|
col_blue <- "#000088"
|
|
col_smooth_expost <- "#a7008b"
|
|
col_smooth <- "#187a00"
|
|
col_pointwise <- "#008790"
|
|
col_constant <- "#dd9002"
|
|
col_optimum <- "#666666"
|
|
```
|
|
|
|
```{r xaringan-panelset, echo=FALSE}
|
|
xaringanExtra::use_panelset()
|
|
```
|
|
|
|
```{r xaringanExtra-freezeframe, echo=FALSE}
|
|
xaringanExtra::use_freezeframe(responsive = TRUE)
|
|
```
|
|
|
|
# Outline
|
|
|
|
- [Motivation](#motivation)
|
|
- [The Framework of Prediction under Expert Advice](#pred_under_exp_advice)
|
|
- [The Continious Ranked Probability Scrore](#crps)
|
|
- [Optimality of (Pointwise) CRPS-Learning](#crps_optim)
|
|
- [A Simple Probabilistic Example](#simple_example)
|
|
- [The Proposed CRPS-Learning Algorithm](#proposed_algorithm)
|
|
- [Simulation Results](#simulation)
|
|
- [Possible Extensions](#extensions)
|
|
- [Application Study](#application)
|
|
- [Wrap-Up](#conclusion)
|
|
- [References](#references)
|
|
|
|
---
|
|
|
|
class: center, middle, sydney-blue
|
|
|
|
# Motivation
|
|
|
|
name: motivation
|
|
|
|
## Motivation
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
The Idea:
|
|
|
|
- Combine multiple forecasts instead of choosing one
|
|
|
|
- Combination weights may vary over **time**, over the **distribution** or **both**
|
|
|
|
2 Popular options for combining distributions:
|
|
|
|
- Combining across quantiles (this paper)
|
|
- Horizontal aggregation, vincentization
|
|
- Combining across probabilities
|
|
- Vertical aggregation
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
::: {.panel-tabset}
|
|
|
|
## Time
|
|
|
|
```{r, echo = FALSE, fig.height=6}
|
|
par(mfrow = c(3, 3), mar = c(2, 2, 2, 2))
|
|
set.seed(1)
|
|
# Data
|
|
X <- matrix(ncol = 3, nrow = 15)
|
|
X[, 1] <- seq(from = 8, to = 12, length.out = 15) + 0.25 * rnorm(15)
|
|
X[, 2] <- 10 + 0.25 * rnorm(15)
|
|
X[, 3] <- seq(from = 12, to = 8, length.out = 15) + 0.25 * rnorm(15)
|
|
# Weights
|
|
w <- matrix(ncol = 3, nrow = 15)
|
|
w[, 1] <- sin(0.1 * 1:15)
|
|
w[, 2] <- cos(0.1 * 1:15)
|
|
w[, 3] <- seq(from = -2, 0.25, length.out = 15)^2
|
|
w <- (w / rowSums(w))
|
|
# Vis
|
|
plot(X[, 1],
|
|
lwd = 4,
|
|
type = "l",
|
|
ylim = c(8, 12),
|
|
xlab = "",
|
|
ylab = "",
|
|
xaxt = "n",
|
|
yaxt = "n",
|
|
bty = "n",
|
|
col = "#2050f0"
|
|
)
|
|
plot(w[, 1],
|
|
lwd = 4, type = "l",
|
|
ylim = c(0, 1),
|
|
xlab = "",
|
|
ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#2050f0"
|
|
)
|
|
text(6, 0.5, TeX("$w_1(t)$"), cex = 2, col = "#2050f0")
|
|
arrows(13, 0.25, 15, 0.0, , lwd = 4, bty = "n")
|
|
plot.new()
|
|
plot(X[, 2],
|
|
lwd = 4,
|
|
type = "l", ylim = c(8, 12),
|
|
xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "purple"
|
|
)
|
|
plot(w[, 2],
|
|
lwd = 4, type = "l",
|
|
ylim = c(0, 1),
|
|
xlab = "",
|
|
ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "purple"
|
|
)
|
|
text(6, 0.6, TeX("$w_2(t)$"), cex = 2, col = "purple")
|
|
arrows(13, 0.5, 15, 0.5, , lwd = 4, bty = "n")
|
|
plot(rowSums(X * w), lwd = 4, type = "l", xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#298829")
|
|
plot(X[, 3],
|
|
lwd = 4,
|
|
type = "l", ylim = c(8, 12),
|
|
xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#e423b4"
|
|
)
|
|
plot(w[, 3],
|
|
lwd = 4, type = "l",
|
|
ylim = c(0, 1),
|
|
xlab = "",
|
|
ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#e423b4"
|
|
)
|
|
text(6, 0.25, TeX("$w_3(t)$"), cex = 2, col = "#e423b4")
|
|
arrows(13, 0.75, 15, 1, , lwd = 4, bty = "n")
|
|
```
|
|
|
|
## Distribution
|
|
|
|
```{r, echo = FALSE, fig.height=6}
|
|
par(mfrow = c(3, 3), mar = c(2, 2, 2, 2))
|
|
set.seed(1)
|
|
# Data
|
|
X <- matrix(ncol = 3, nrow = 31)
|
|
|
|
X[, 1] <- dchisq(0:30, df = 10)
|
|
X[, 2] <- dnorm(0:30, mean = 15, sd = 5)
|
|
X[, 3] <- dexp(0:30, 0.2)
|
|
# Weights
|
|
w <- matrix(ncol = 3, nrow = 31)
|
|
w[, 1] <- sin(0.05 * 0:30)
|
|
w[, 2] <- cos(0.05 * 0:30)
|
|
w[, 3] <- seq(from = -2, 0.25, length.out = 31)^2
|
|
w <- (w / rowSums(w))
|
|
# Vis
|
|
plot(X[, 1],
|
|
lwd = 4,
|
|
type = "l",
|
|
xlab = "",
|
|
ylab = "",
|
|
xaxt = "n",
|
|
yaxt = "n",
|
|
bty = "n",
|
|
col = "#2050f0"
|
|
)
|
|
plot(X[, 2],
|
|
lwd = 4,
|
|
type = "l",
|
|
xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "purple"
|
|
)
|
|
plot(X[, 3],
|
|
lwd = 4,
|
|
type = "l",
|
|
xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#e423b4"
|
|
)
|
|
plot(w[, 1],
|
|
lwd = 4, type = "l",
|
|
ylim = c(0, 1),
|
|
xlab = "",
|
|
ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#2050f0"
|
|
)
|
|
text(12, 0.5, TeX("$w_1(x)$"), cex = 2, col = "#2050f0")
|
|
arrows(26, 0.25, 31, 0.0, , lwd = 4, bty = "n")
|
|
plot(w[, 2],
|
|
lwd = 4, type = "l",
|
|
ylim = c(0, 1),
|
|
xlab = "",
|
|
ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "purple"
|
|
)
|
|
text(15, 0.5, TeX("$w_2(x)$"), cex = 2, col = "purple")
|
|
arrows(15, 0.25, 15, 0, , lwd = 4, bty = "n")
|
|
plot(w[, 3],
|
|
lwd = 4, type = "l",
|
|
ylim = c(0, 1),
|
|
xlab = "",
|
|
ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#e423b4"
|
|
)
|
|
text(20, 0.5, TeX("$w_3(x)$"), cex = 2, col = "#e423b4")
|
|
arrows(5, 0.25, 0, 0, , lwd = 4, bty = "n")
|
|
plot.new()
|
|
plot(rowSums(X * w), lwd = 4, type = "l", xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#298829")
|
|
```
|
|
|
|
:::
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
# The Framework of Prediction under Expert Advice
|
|
|
|
## The Framework of Prediction under Expert Advice
|
|
|
|
### The sequential framework
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
Each day, $t = 1, 2, ... T$
|
|
- The **forecaster** receives predictions $\widehat{X}_{t,k}$ from $K$ **experts**
|
|
- The **forecaster** assings weights $w_{t,k}$ to each **expert**
|
|
- The **forecaster** calculates her prediction:
|
|
\begin{equation}
|
|
\widetilde{X}_{t} = \sum_{k=1}^K w_{t,k} \widehat{X}_{t,k}.
|
|
\label{eq_forecast_def}
|
|
\end{equation}
|
|
- The realization for $t$ is observedilities
|
|
- Vertical aggregation
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
- The experts can be institutions, persons, or models
|
|
- The forecasts can be point-forecasts (i.e., mean or median) or full predictive distributions
|
|
- We do not need any assumptions concerning the underlying data
|
|
- `r Citet(my_bib, "cesa2006prediction")`
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
---
|
|
|
|
## The Regret
|
|
|
|
Weights are updated sequentially according to the past performance of the $K$ experts.
|
|
|
|
That is, a loss function $\ell$ is needed. This is used to compute the **cumulative regret** $R_{t,k}$
|
|
|
|
\begin{equation}
|
|
R_{t,k} = \widetilde{L}_{t} - \widehat{L}_{t,k} = \sum_{i = 1}^t \ell(\widetilde{X}_{i},Y_i) - \ell(\widehat{X}_{i,k},Y_i)
|
|
\label{eq_regret}
|
|
\end{equation}
|
|
|
|
The cumulative regret:
|
|
- Indicates the predictive accuracy of the expert $k$ until time $t$.
|
|
- Measures how much the forecaster *regrets* not having followed the expert's advice
|
|
|
|
Popular loss functions for point forecasting `r Citet(my_bib, "gneiting2011making")`:
|
|
.pull-left[
|
|
- $\ell_2$-loss $\ell_2(x, y) = | x -y|^2$
|
|
- optimal for mean prediction
|
|
]
|
|
.pull-right[
|
|
- $\ell_1$-loss $\ell_1(x, y) = | x -y|$
|
|
- optimal for median predictions
|
|
]
|
|
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
- $\ell_2$-loss $\ell_2(x, y) = | x -y|^2$
|
|
- optimal for mean prediction
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
- $\ell_1$-loss $\ell_1(x, y) = | x -y|$
|
|
- optimal for median predictions
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
|
|
## Popular Aggregation Algorithms
|
|
|
|
#### The naive combination
|
|
|
|
\begin{equation}
|
|
w_{t,k}^{\text{Naive}} = \frac{1}{K}
|
|
\end{equation}
|
|
|
|
#### The exponentially weighted average forecaster (EWA)
|
|
|
|
\begin{align}
|
|
w_{t,k}^{\text{EWA}} & = \frac{e^{\eta R_{t,k}} }{\sum_{k = 1}^K e^{\eta R_{t,k}}}
|
|
=
|
|
\frac{e^{-\eta \ell(\widehat{X}_{t,k},Y_t)} w^{\text{EWA}}_{t-1,k} }{\sum_{k = 1}^K e^{-\eta \ell(\widehat{X}_{t,k},Y_t)} w^{\text{EWA}}_{t-1,k} }
|
|
\label{eq_ewa_general}
|
|
\end{align}
|
|
|
|
#### The polynomial weighted aggregation (PWA)
|
|
|
|
\begin{align}
|
|
w_{t,k}^{\text{PWA}} & = \frac{ 2(R_{t,k})^{q-1}_{+} }{ \|(R_t)_{+}\|^{q-2}_q}
|
|
\label{eq_pwa_general}
|
|
\end{align}
|
|
|
|
with $q\geq 2$ and $x_{+}$ the (vector) of positive parts of $x$.
|
|
|
|
## Optimality
|
|
|
|
In stochastic settings, the cumulative Risk should be analyezed `r Citet(my_bib, "wintenberger2017optimal")`:
|
|
|
|
\begin{align}
|
|
\underbrace{\widetilde{\mathcal{R}}_t = \sum_{i=1}^t \mathbb{E}[\ell(\widetilde{X}_{i},Y_i)|\mathcal{F}_{i-1}]}_{\text{Cumulative Risk of Forecaster}} \qquad\qquad\qquad \text{ and } \qquad\qquad\qquad
|
|
\underbrace{\widehat{\mathcal{R}}_{t,k} = \sum_{i=1}^t \mathbb{E}[\ell(\widehat{X}_{i,k},Y_i)|\mathcal{F}_{i-1}]}_{\text{Cumulative Risk of Experts}}
|
|
\label{eq_def_cumrisk}
|
|
\end{align}
|
|
|
|
There are two problems that an algorithm should solve in iid settings:
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
### The selection problem
|
|
\begin{equation}
|
|
\frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\min} \right) \stackrel{t\to \infty}{\rightarrow} a \quad \text{with} \quad a \leq 0.
|
|
\label{eq_opt_select}
|
|
\end{equation}
|
|
The forecaster is asymptotically not worse than the best expert $\widehat{\mathcal{R}}_{t,\min}$.
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
### The convex aggregation problem
|
|
|
|
\begin{equation}
|
|
\frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\pi} \right) \stackrel{t\to \infty}{\rightarrow} b \quad \text{with} \quad b \leq 0 .
|
|
\label{eq_opt_conv}
|
|
\end{equation}
|
|
The forecaster is asymptotically not worse than the best convex combination $\widehat{X}_{t,\pi}$ in hindsight (**oracle**).
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
## Optimality
|
|
|
|
Satisfying the convexity property \eqref{eq_opt_conv} comes at the cost of slower possible convergence.
|
|
|
|
According to `r Citet(my_bib, "wintenberger2017optimal")`, an algorithm has optimal rates with respect to selection \eqref{eq_opt_select} and convex aggregation \eqref{eq_opt_conv} if
|
|
|
|
\begin{align}
|
|
\frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\min} \right) & =
|
|
\mathcal{O}\left(\frac{\log(K)}{t}\right)\label{eq_optp_select}
|
|
\end{align}
|
|
|
|
\begin{align}
|
|
\frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\pi} \right) & =
|
|
\mathcal{O}\left(\sqrt{\frac{\log(K)}{t}}\right)
|
|
\label{eq_optp_conv}
|
|
\end{align}
|
|
|
|
Algorithms can statisfy both \eqref{eq_optp_select} and \eqref{eq_optp_conv} depending on:
|
|
|
|
- The loss function
|
|
- Regularity conditions on $Y_t$ and $\widehat{X}_{t,k}$
|
|
- The weighting scheme
|
|
|
|
## Optimality
|
|
|
|
According to `r Citet(my_bib, "cesa2006prediction")` EWA \eqref{eq_ewa_general} satisfies the optimal selection convergence \eqref{eq_optp_select} in a deterministic setting if the:
|
|
- Loss $\ell$ is exp-concave
|
|
- Learning-rate $\eta$ is chosen correctly
|
|
|
|
Those results can be converted to stochastic iid settings `r Citet(my_bib, "kakade2008generalization")` `r Citet(my_bib, "gaillard2014second")`.
|
|
|
|
The optimal convex aggregation convergence \eqref{eq_optp_conv} can be satisfied by applying the kernel-trick. Thereby, the loss is linearized:
|
|
\begin{align}
|
|
\ell^{\nabla}(x,y) = \ell'(\widetilde{X},y) x
|
|
\end{align}
|
|
$\ell'$ is the subgradient of $\ell$ in its first coordinate evaluated at forecast combination $\widetilde{X}$.
|
|
|
|
Combining probabilistic forecasts calls for a probabilistic loss function
|
|
|
|
:::: {.notes}
|
|
|
|
We apply Bernstein Online Aggregation (BOA). It lets us weaken the exp-concavity condition while almost keeping the optimalities \ref{eq_optp_select} and \ref{eq_optp_conv}.
|
|
|
|
::::
|
|
|
|
## The Continuous Ranked Probability Score
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
**An appropriate choice:**
|
|
|
|
\begin{align*}
|
|
\text{CRPS}(F, y) & = \int_{\mathbb{R}} {(F(x) - \mathbb{1}\{ x > y \})}^2 dx
|
|
\label{eq_crps}
|
|
\end{align*}
|
|
|
|
It's strictly proper `r Citet(my_bib, "gneiting2007strictly")`.
|
|
|
|
Using the CRPS, we can calculate time-adaptive weight $w_{t,k}$. However, what if the experts' performance is not uniform over all parts of the distribution?
|
|
|
|
The idea: utilize this relation:
|
|
|
|
\begin{align*}
|
|
\text{CRPS}(F, y) = 2 \int_0^{1} \text{QL}_p(F^{-1}(p), y) \, d p.
|
|
\label{eq_crps_qs}
|
|
\end{align*}
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
to combine quantiles of the probabilistic forecasts individually using the quantile-loss (QL):
|
|
\begin{align*}
|
|
\text{QL}_p(q, y) & = (\mathbb{1}\{y < q\} -p)(q - y)
|
|
\end{align*}
|
|
|
|
</br>
|
|
|
|
**But is it optimal?**
|
|
|
|
CRPS is exp-concave `r fontawesome::fa("check", fill ="#00b02f")`
|
|
|
|
`r fontawesome::fa("arrow-right", fill ="#000000")` EWA \eqref{eq_ewa_general} with CRPS satisfies \eqref{eq_optp_select} and \eqref{eq_optp_conv}
|
|
|
|
QL is convex, but not exp-concave `r fontawesome::fa("exclamation", fill ="#ffa600")`
|
|
|
|
`r fontawesome::fa("arrow-right", fill ="#000000")` Bernstein Online Aggregation (BOA) lets us weaken the exp-concavity condition while almost keeping optimal convergence
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
## CRPS-Learning Optimality
|
|
|
|
For convex losses, BOAG satisfies that there exist a $C>0$ such that for $x>0$ it holds that
|
|
\begin{equation}
|
|
P\left( \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\pi} \right) \leq C \log(\log(t)) \left(\sqrt{\frac{\log(K)}{t}} + \frac{\log(K)+x}{t}\right) \right) \geq
|
|
1-e^{x}
|
|
\label{eq_boa_opt_conv}
|
|
\end{equation}
|
|
`r fontawesome::fa("arrow-right", fill ="#000000")` Almost optimal w.r.t *convex aggregation* \eqref{eq_optp_conv} `r Citet(my_bib, "wintenberger2017optimal")` .
|
|
|
|
The same algorithm satisfies that there exist a $C>0$ such that for $x>0$ it holds that
|
|
\begin{equation}
|
|
P\left( \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\min} \right) \leq
|
|
C\left(\frac{\log(K)+\log(\log(Gt))+ x}{\alpha t}\right)^{\frac{1}{2-\beta}} \right) \geq
|
|
1-e^{x}
|
|
\label{eq_boa_opt_select}
|
|
\end{equation}
|
|
|
|
if $Y_t$ is bounded, the considered loss $\ell$ is convex $G$-Lipschitz and weak exp-concave in its first coordinate.
|
|
|
|
This is for losses that satisfy **A1** and **A2**.
|
|
|
|
## CRPS-Learning Optimality
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
**A1**
|
|
|
|
For some $G>0$ it holds
|
|
for all $x_1,x_2\in \mathbb{R}$ and $t>0$ that
|
|
|
|
$$ | \ell(x_1, Y_t)-\ell(x_2, Y_t) | \leq G |x_1-x_2|$$
|
|
|
|
**A2** For some $\alpha>0$, $\beta\in[0,1]$ it holds
|
|
for all $x_1,x_2 \in \mathbb{R}$ and $t>0$ that
|
|
|
|
\begin{align*}
|
|
\mathbb{E}[
|
|
& \ell(x_1, Y_t)-\ell(x_2, Y_t) | \mathcal{F}_{t-1}] \leq \\
|
|
& \mathbb{E}[ \ell'(x_1, Y_t)(x_1 - x_2) |\mathcal{F}_{t-1}] \\
|
|
& +
|
|
\mathbb{E}\left[ \left. \left( \alpha(\ell'(x_1, Y_t)(x_1 - x_2))^{2}\right)^{1/\beta} \right|\mathcal{F}_{t-1}\right]
|
|
\end{align*}
|
|
|
|
`r fontawesome::fa("arrow-right", fill ="#000000")` Almost optimal w.r.t *selection* \eqref{eq_optp_select} `r Citet(my_bib, "gaillard2018efficient")`.
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
**Lemma 1**
|
|
|
|
\begin{align}
|
|
2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\min}
|
|
& \leq \widehat{\mathcal{R}}^{\text{CRPS}}_{t,\min}
|
|
\label{eq_risk_ql_crps_expert} \\
|
|
2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\pi}
|
|
& \leq \widehat{\mathcal{R}}^{\text{CRPS}}_{t,\pi} .
|
|
\label{eq_risk_ql_crps_convex}
|
|
\end{align}
|
|
|
|
Pointwise can outperform constant procedures
|
|
|
|
QL is convex but not exp-concave:
|
|
|
|
`r fontawesome::fa("arrow-right")` Almost optimal convergence w.r.t. *convex aggregation* \eqref{eq_boa_opt_conv} `r fontawesome::fa("check", fill ="#00b02f")` </br>
|
|
|
|
For almost optimal congerence w.r.t. *selection* \eqref{eq_boa_opt_select} we need to check **A1** and **A2**:
|
|
|
|
QL is Lipschitz continuous:
|
|
|
|
`r fontawesome::fa("arrow-right")` **A1** holds `r fontawesome::fa("check", fill ="#ffa600")` </br>
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
|
|
## CRPS-Learning Optimality
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
Conditional quantile risk: $\mathcal{Q}_p(x) = \mathbb{E}[ \text{QL}_p(x, Y_t) | \mathcal{F}_{t-1}]$.
|
|
|
|
`r fontawesome::fa("arrow-right")` convexity properties of $\mathcal{Q}_p$ depend on the
|
|
conditional distribution $Y_t|\mathcal{F}_{t-1}$.
|
|
|
|
**Proposition 1**
|
|
|
|
Let $Y$ be a univariate random variable with (Radon-Nikodym) $\nu$-density $f$, then for the second subderivative of the quantile risk
|
|
$\mathcal{Q}_p(x) = \mathbb{E}[ \text{QL}_p(x, Y) ]$
|
|
of $Y$ it holds for all $p\in(0,1)$ that
|
|
$\mathcal{Q}_p'' = f.$
|
|
Additionally, if $f$ is a continuous Lebesgue-density with $f\geq\gamma>0$ for some constant $\gamma>0$ on its support $\text{spt}(f)$ then
|
|
is $\mathcal{Q}_p$ is $\gamma$-strongly convex.
|
|
|
|
Strong convexity with $\beta=1$ implies **A2** `r fontawesome::fa("check", fill ="#ffa600")` `r Citet(my_bib, "gaillard2018efficient")`
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
`r fontawesome::fa("arrow-right")` **A1** and **A2** give us almost optimal convergence w.r.t. selection \eqref{eq_boa_opt_select} `r fontawesome::fa("check", fill ="#00b02f")` </br>
|
|
|
|
**Theorem 1**
|
|
|
|
The gradient based fully adaptive Bernstein online aggregation (BOAG) applied pointwise for all $p\in(0,1)$ on $\text{QL}$ satisfies
|
|
\eqref{eq_boa_opt_conv} with minimal CRPS given by
|
|
|
|
$$\widehat{\mathcal{R}}_{t,\pi} = 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\pi}.$$
|
|
|
|
If $Y_t|\mathcal{F}_{t-1}$ is bounded
|
|
and has a pdf $f_t$ satifying $f_t>\gamma >0$ on its
|
|
support $\text{spt}(f_t)$ then \ref{eq_boa_opt_select} holds with $\beta=1$ and
|
|
|
|
$$\widehat{\mathcal{R}}_{t,\min} = 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\min}$$.
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
|
|
## A Probabilistic Example
|
|
|
|
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
Simple Example:
|
|
|
|
|
|
\begin{align}
|
|
Y_t & \sim \mathcal{N}(0,\,1) \\
|
|
\widehat{X}_{t,1} & \sim \widehat{F}_{1} = \mathcal{N}(-1,\,1) \\
|
|
\widehat{X}_{t,2} & \sim \widehat{F}_{2} = \mathcal{N}(3,\,4)
|
|
\label{eq:dgp_sim1}
|
|
\end{align}
|
|
|
|
- True weights vary over $p$
|
|
- Figures show the ECDF and calculated weights using $T=25$ realizations
|
|
- Pointwise solution creates rough estimates
|
|
- Pointwise is better than constant
|
|
- Smooth solution is better than pointwise
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
foo
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Columns Template
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="48%"}
|
|
|
|
Baz
|
|
|
|
:::
|
|
|
|
::: {.column width="2%"}
|
|
|
|
:::
|
|
|
|
::: {.column width="48%"}
|
|
|
|
foo
|
|
|
|
:::
|
|
|
|
::::
|
|
|
|
|
|
# References
|
|
|
|
```{r refs1, echo=FALSE, results="asis"}
|
|
PrintBibliography(my_bib, .opts = list(style = "text"))
|
|
```
|
|
|