Add multivariate crps learning slides

This commit is contained in:
2025-05-18 15:43:15 +02:00
parent afef3f1722
commit 9cf52e5d46
4 changed files with 651 additions and 2 deletions

View File

@@ -1,4 +1,5 @@
text_size <- 16
linesize <- 1
width <- 12
height <- 6
@@ -29,3 +30,26 @@ lamgrid <- c(-Inf, 2^(-15:25))
# Gamma grid
gammagrid <- sort(1 - sqrt(seq(0, 0.99, .05)))
material_pals <- c(
"red", "pink", "purple", "deep-purple", "indigo",
"blue", "light-blue", "cyan", "teal", "green", "light-green", "lime",
"yellow", "amber", "orange", "deep-orange", "brown", "grey", "blue-grey"
)
cols <- purrr::map(material_pals, ~ ggsci::pal_material(.x)(10)) %>%
purrr::reduce(cbind)
colnames(cols) <- material_pals
cols %>%
as_tibble() %>%
mutate(idx = as.factor(1:10)) %>%
pivot_longer(-idx, names_to = "var", values_to = "val") %>%
mutate(var = factor(var, levels = material_pals[19:1])) %>%
ggplot() +
xlab(NULL) +
ylab(NULL) +
geom_tile(aes(x = idx, y = var, fill = val)) +
scale_fill_identity() +
scale_x_discrete(expand = c(0, 0)) +
scale_y_discrete(expand = c(0, 0)) +
theme_minimal() -> plot_cols

View File

@@ -14,6 +14,16 @@
booktitle = {Oxford Research Encyclopedia of Economics and Finance},
year = {2019}
}
@article{marcjasz2022distributional,
title = {Distributional neural networks for electricity price forecasting},
author = {Marcjasz, Grzegorz and Narajewski, Micha{\l} and Weron, Rafa{\l} and Ziel, Florian},
journal = {Energy Economics},
volume = {125},
pages = {106843},
year = {2023},
doi = {10.1016/j.eneco.2023.106843},
publisher = {Elsevier}
}
@article{atiya2020does,
title = {Why does forecast combination work so well?},
author = {Atiya, Amir F},

View File

@@ -1428,10 +1428,12 @@ weights %>%
::: {.column width="48%"}
Potential Downsides:
- Pointwise optimization can induce quantile crossing
- Can be solved by sorting the predictions
Upsides:
- Pointwise learning outperforms the Naive solution significantly
- Online learning is much faster than batch methods
- Smoothing further improves the predictive performance
@@ -1464,7 +1466,7 @@ The [`r fontawesome::fa("github")` profoc](https://profoc.berrisch.biz/) R Packa
::::
:::: {.notes}
<!-- :::: {.notes}
Execution Times:
@@ -1478,7 +1480,619 @@ Boa > 212 ms
Profoc:
Ml-Poly > 17
BOA > 16
BOA > 16 -->
# Multivariate Probabilistic CRPS Learning with an Application to Day-Ahead Electricity Prices
---
## Outline
```{r, include=FALSE}
col_lightgray <- "#e7e7e7"
col_blue <- "#000088"
col_smooth_expost <- "#a7008b"
col_smooth <- "#187a00"
col_pointwise <- "#008790"
col_constant <- "#dd9002"
col_optimum <- "#666666"
col_green <- "#61B94C"
col_orange <- "#ffa600"
col_yellow <- "#FCE135"
```
</br>
**Multivariate CRPS Learning**
- Introduction
- Smoothing procedures
- Application to multivariate electricity price forecasts
**The `profoc` R package**
- Package overview
- Implementation details
- Illustrative examples
## The Framework of Prediction under Expert Advice
### The sequential framework
:::: {.columns}
::: {.column width="48%"}
Each day, $t = 1, 2, ... T$
- The **forecaster** receives predictions $\widehat{X}_{t,k}$ from $K$ **experts**
- The **forecaster** assings weights $w_{t,k}$ to each **expert**
- The **forecaster** calculates her prediction:
$$\widetilde{X}_{t}=\sum_{k=1}^K w_{t,k}\widehat{X}_{t,k}$$
- The realization for $t$ is observed
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
- The experts can be institutions, persons, or models
- The forecasts can be point-forecasts (i.e., mean or median) or full predictive distributions
- We do not need any assumptions concerning the underlying data
- `r Citet(my_bib, "cesa2006prediction")`
:::
::::
## The Regret
Weights are updated sequentially according to the past performance of the $K$ experts.
`r fontawesome::fa("arrow-right", fill ="#000000")` A loss function $\ell$ is needed (to compute the **cumulative regret** $R_{t,k}$)
\begin{equation}
R_{t,k} = \widetilde{L}_{t} - \widehat{L}_{t,k} = \sum_{i = 1}^t \ell(\widetilde{X}_{i},Y_i) - \ell(\widehat{X}_{i,k},Y_i)
\label{eq_regret}
\end{equation}
The cumulative regret:
- Indicates the predictive accuracy of expert $k$ until time $t$.
- Measures how much the forecaster *regrets* not having followed the expert's advice
Popular loss functions for point forecasting `r Citet(my_bib, "gneiting2011making")`:
:::: {.columns}
::: {.column width="48%"}
- $\ell_2$-loss $\ell_2(x, y) = | x -y|^2$
- optimal for mean prediction
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
- $\ell_1$-loss $\ell_1(x, y) = | x -y|$
- optimal for median predictions
:::
::::
---
:::: {.columns}
::: {.column width="48%"}
### Probabilistic Setting
An appropriate loss:
\begin{align*}
\text{CRPS}(F, y) & = \int_{\mathbb{R}} {(F(x) - \mathbb{1}\{ x > y \})}^2 dx
\label{eq_crps}
\end{align*}
It's strictly proper `r Citet(my_bib, "gneiting2007strictly")`.
Using the CRPS, we can calculate time-adaptive weights $w_{t,k}$. However, what if the experts' performance varies in parts of the distribution?
`r fontawesome::fa("lightbulb", fill = col_yellow)` Utilize this relation:
\begin{align*}
\text{CRPS}(F, y) = 2 \int_0^{1} \text{QL}_p(F^{-1}(p), y) \, d p.
\label{eq_crps_qs}
\end{align*}
... to combine quantiles of the probabilistic forecasts individually using the quantile-loss QL.
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
### Optimal Convergence
</br>
`r fontawesome::fa("exclamation", fill = col_orange)` exp-concavity of the loss is required for *selection* and *convex aggregation* properties
`r fontawesome::fa("exclamation", fill = col_orange)` QL is convex, but not exp-concave
`r fontawesome::fa("arrow-right", fill ="#000000")` The Bernstein Online Aggregation (BOA) lets us weaken the exp-concavity condition.
Convergence rates of BOA are:
`r fontawesome::fa("arrow-right", fill ="#000000")` Almost optimal w.r.t *selection* `r Citet(my_bib, "gaillard2018efficient")`.
`r fontawesome::fa("arrow-right", fill ="#000000")` Almost optimal w.r.t *convex aggregation* `r Citet(my_bib, "wintenberger2017optimal")`.
:::
::::
## Multivariate CRPS Learning
:::: {.columns}
::: {.column width="48%"}
Additionally, we extend the **B-Smooth** and **P-Smooth** procedures to the multivariate setting:
- Basis matrices for reducing
- - the probabilistic dimension from $P$ to $\widetilde P$
- - the multivariate dimension from $D$ to $\widetilde D$
- Hat matrices
- - penalized smoothing across P and D dimensions
We utilize the mean Pinball Score over the entire space for hyperparameter optimization (e.g, $\lambda$)
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
*Basis Smoothing*
Represent weights as linear combinations of bounded basis functions:
\begin{equation}
\underbrace{\boldsymbol w_{t,k}}_{D \text{ x } P} = \sum_{j=1}^{\widetilde D} \sum_{l=1}^{\widetilde P} \beta_{t,j,l,k} \varphi^{\text{mv}}_{j} \varphi^{\text{pr}}_{l} = \underbrace{\boldsymbol \varphi^{\text{mv}}}_{D\text{ x }\widetilde D} \boldsymbol \beta_{t,k} \underbrace{{\boldsymbol\varphi^{\text{pr}}}'}_{\widetilde P \text{ x }P} \nonumber
\end{equation}
A popular choice: B-Splines
$\boldsymbol \beta_{t,k}$ is calculated using a reduced regret matrix:
$\underbrace{\boldsymbol r_{t,k}}_{\widetilde P \times \widetilde D} = \boldsymbol \varphi^{\text{pr}} \underbrace{\left({\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widetilde{\boldsymbol X}_{t},Y_t)- {\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widehat{\boldsymbol X}_{t},Y_t)\right)}_{\text{PxD}}\boldsymbol \varphi^{\text{mv}}$
If $\widetilde P = P$ it holds that $\boldsymbol \varphi^{pr} = \boldsymbol{I}$ (pointwise)
For $\widetilde P = 1$ we receive constant weights
:::
::::
## Multivariate CRPS Learning
:::: {.columns}
::: {.column width="48%"}
**Penalized smoothing:**
Let $\boldsymbol{\psi}^{\text{mv}}=(\psi_1,\ldots, \psi_{D})$ and $\boldsymbol{\psi}^{\text{pr}}=(\psi_1,\ldots, \psi_{P})$ be two sets of bounded basis functions on $(0,1)$:
\begin{equation}
\boldsymbol w_{t,k} = \boldsymbol{\psi}^{\text{mv}} \boldsymbol{b}_{t,k} {\boldsymbol{\psi}^{pr}}'
\end{equation}
with parameter matix $\boldsymbol b_{t,k}$. The latter is estimated to penalize $L_2$-smoothing which minimizes
\begin{align}
& \| \boldsymbol{\beta}_{t,d, k}' \boldsymbol{\varphi}^{\text{pr}} - \boldsymbol b_{t, d, k}' \boldsymbol{\psi}^{\text{pr}} \|^2_2 + \lambda^{\text{pr}} \| \mathcal{D}_{q} (\boldsymbol b_{t, d, k}' \boldsymbol{\psi}^{\text{pr}}) \|^2_2 + \nonumber \\
& \| \boldsymbol{\beta}_{t, p, k}' \boldsymbol{\varphi}^{\text{mv}} - \boldsymbol b_{t, p, k}' \boldsymbol{\psi}^{\text{mv}} \|^2_2 + \lambda^{\text{mv}} \| \mathcal{D}_{q} (\boldsymbol b_{t, p, k}' \boldsymbol{\psi}^{\text{mv}}) \|^2_2 \nonumber
\end{align}
with differential operator $\mathcal{D}_q$ of order $q$
Computation is easy since we have an analytical solution.
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
```{r, fig.align="center", echo=FALSE, out.width = "1000px"}
knitr::include_graphics("assets/mcrps_learning/algorithm.svg")
```
:::
::::
## Application
:::: {.columns}
::: {.column width="48%"}
#### Data
- Day-Ahead electricity price forecasts from `r Citet(my_bib, "marcjasz2022distributional")`
- Produced using probabilistic neural networks
- 24-dimensional distributional forecasts
- Distribution assumptions: JSU and Normal
- 8 experts (4 JSU, 4 Normal)
- 27th Dec. 2018 to 31st Dec. 2020 (736 days)
- We extract 99 quantiles (percentiles)
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
#### Setup
Evaluation: Exclude first 182 observations
Extensions: Penalized smoothing | Forgetting
Tuning strategies:
- Bayesian Fix
- Sophisticated Baesian Search algorithm
- Online
- Dynamic based on past performance
- Bayesian Online
- First Bayesian Fix then Online
Computation Time: ~30 Minutes
:::
::::
# Special Cases
:::: {.columns}
::: {.column width="48%"}
::: {.panel-tabset}
## Constant
```{r, fig.align="center", echo=FALSE, out.width = "400"}
knitr::include_graphics("assets/mcrps_learning/constant.svg")
```
## Constant PR
```{r, fig.align="center", echo=FALSE, out.width = "400"}
knitr::include_graphics("assets/mcrps_learning/constant_pr.svg")
```
## Constant MV
```{r, fig.align="center", echo=FALSE, out.width = "400"}
knitr::include_graphics("assets/mcrps_learning/constant_mv.svg")
```
::::
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
::: {.panel-tabset}
## Pointwise
```{r, fig.align="center", echo=FALSE, out.width = "400"}
knitr::include_graphics("assets/mcrps_learning/pointwise.svg")
```
## Smooth
```{r, fig.align="center", echo=FALSE, out.width = "400"}
knitr::include_graphics("assets/mcrps_learning/smooth_best.svg")
```
::::
:::
::::
## Results
```{r, fig.align="center", echo=FALSE, out.width = "400"}
knitr::include_graphics("assets/mcrps_learning/tab_performance_sa.svg")
```
## Results
```{r, warning=FALSE, fig.align="center", echo=FALSE, fig.width=12, fig.height=6}
load("assets/mcrps_learning/pars_data.rds")
pars_data %>%
ggplot(aes(x = dates, y = value)) +
geom_rect(aes(
ymin = 0,
ymax = value * 1.2,
xmin = dates[1],
xmax = dates[182],
fill = "Burn-In"
)) +
geom_line(aes(color = name), linewidth = linesize, show.legend = FALSE) +
scale_colour_manual(
values = as.character(cols[5, c("pink", "amber", "green")])
) +
facet_grid(name ~ .,
scales = "free_y",
# switch = "both"
) +
scale_y_continuous(
trans = "log2",
labels = scaleFUN
) +
theme_minimal() +
theme(
# plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"),
text = element_text(size = text_size),
legend.key.width = unit(0.9, "inch"),
legend.position = "none"
) +
ylab(NULL) +
xlab("date") +
scale_fill_manual(NULL,
values = as.character(cols[3, "grey"])
)
```
## Results: Hour 16:00-17:00
```{r, fig.align="center", echo=FALSE, fig.width=12, fig.height=6}
load("assets/mcrps_learning/weights_h.rds")
weights_h %>%
ggplot(aes(date, q, fill = weight)) +
geom_raster(interpolate = TRUE) +
facet_grid(
Expert ~ . # , labeller = labeller(Mod = mod_labs)
) +
theme_minimal() +
theme(
# plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"),
text = element_text(size = text_size),
legend.key.height = unit(0.9, "inch")
) +
scale_x_date(expand = c(0, 0)) +
scale_fill_gradientn(
oob = scales::squish,
limits = c(0, 1),
values = c(seq(0, 0.4, length.out = 8), 0.65, 1),
colours = c(
cols[8, "red"],
cols[5, "deep-orange"],
cols[5, "amber"],
cols[5, "yellow"],
cols[5, "lime"],
cols[5, "light-green"],
cols[5, "green"],
cols[7, "green"],
cols[9, "green"],
cols[10, "green"]
),
breaks = seq(0, 1, 0.1)
) +
xlab("date") +
ylab("probability") +
scale_y_continuous(breaks = c(0.1, 0.5, 0.9))
```
## Results: Median
```{r, fig.align="center", echo=FALSE, fig.width=12, fig.height=6}
load("assets/mcrps_learning/weights_q.rds")
weights_q %>%
mutate(hour = as.numeric(hour) - 1) %>%
ggplot(aes(date, hour, fill = weight)) +
geom_raster(interpolate = TRUE) +
facet_grid(
Expert ~ . # , labeller = labeller(Mod = mod_labs)
) +
theme_minimal() +
theme(
# plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"),
text = element_text(size = text_size),
legend.key.height = unit(0.9, "inch")
) +
scale_x_date(expand = c(0, 0)) +
scale_fill_gradientn(
oob = scales::squish,
limits = c(0, 1),
values = c(seq(0, 0.4, length.out = 8), 0.65, 1),
colours = c(
cols[8, "red"],
cols[5, "deep-orange"],
cols[5, "amber"],
cols[5, "yellow"],
cols[5, "lime"],
cols[5, "light-green"],
cols[5, "green"],
cols[7, "green"],
cols[9, "green"],
cols[10, "green"]
),
breaks = seq(0, 1, 0.1)
) +
xlab("date") +
ylab("hour") +
scale_y_continuous(breaks = c(0, 8, 16, 24))
```
## Profoc R Package
:::: {.columns}
::: {.column width="48%"}
### Probabilistic Forecast Combination - profoc
Available on [Github](https://github.com/BerriJ/profoc) and [CRAN](https://CRAN.R-project.org/package=profoc)
Main Function: `online()` for online learning.
- Works with multivariate and/or probabilistic data
- Implements BOA, ML-POLY, EWA (and the gradient versions)
- Implements many extensions like smoothing, forgetting, thresholding, etc.
- Various loss functions are available
- Various methods (`predict`, `update`, `plot`, etc.)
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
### Speed
Large parts of profoc are implemented in C++.
<center>
<img src="assets/mcrps_learning/profoc_langs.png">
</center>
We use `Rcpp`, `RcppArmadillo`, and OpenMP.
We use `Rcpp` modules to expose a class to R
- Offers great flexibility for the end-user
- Requires very little knowledge of C++ code
- High-Level interface is easy to use
:::
::::
## Profoc - B-Spline Basis
:::: {.columns}
::: {.column width="48%"}
Basis specification `b_smooth_pr` is internally passed to `make_basis_mats()`:
```{r, echo = TRUE, eval = FALSE, cache = FALSE}
mod <- online(
y = Y,
experts = experts,
tau = 1:99 / 100,
b_smooth_pr = list(
knots = 9,
mu = 0.3, # NEW
sigma = 1,
nonc = 0,
tailweight = 1,
deg = 3
)
)
```
Knots are distributed using the generalized beta distribution.
TODO: Add actual algorithm to backup slides
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
TODO
:::
::::
## Wrap-Up
:::: {.columns}
::: {.column width="48%"}
The [`r fontawesome::fa("github")` profoc](https://profoc.berrisch.biz/) R Package:
Profoc is a flexible framework for online learning.
- It implements several algorithms
- It implements several loss functions
- It implements several extensions
- Its high- and low-level interfaces offer great flexibility
Profoc is fast.
- The core components are written in C++
- The core components utilize OpenMP for parallelization
:::
::: {.column width="2%"}
:::
::: {.column width="48%"}
Multivariate Extension:
- Code is available now
- [Pre-Print](https://arxiv.org/abs/2303.10019) is available now
Get these slides:
<center>
<img src="assets/mcrps_learning/web_pres.png">
</center>
[https://berrisch.biz/slides/23_06_ecmi/](https://berrisch.biz/slides/23_06_ecmi/)
:::
::::
## Columns Template

View File

@@ -79,6 +79,7 @@
rPackages.gganimate
rPackages.cowplot
rPackages.xtable
rPackages.ggsci
rPackages.profoc
];