Add multivariate crps learning slides

2025-05-18 15:43:15 +02:00
parent afef3f1722
commit 9cf52e5d46
4 changed files with 651 additions and 2 deletions
--- a/25_07_phd_defense/assets/01_common.R
+++ b/25_07_phd_defense/assets/01_common.R
@@ -1,4 +1,5 @@
 text_size <- 16
 linesize <- 1
 width <- 12
 height <- 6
@@ -29,3 +30,26 @@ lamgrid <- c(-Inf, 2^(-15:25))
 # Gamma grid
 gammagrid <- sort(1 - sqrt(seq(0, 0.99, .05)))
 material_pals <- c(
  "red", "pink", "purple", "deep-purple", "indigo",
  "blue", "light-blue", "cyan", "teal", "green", "light-green", "lime",
  "yellow", "amber", "orange", "deep-orange", "brown", "grey", "blue-grey"
 )
 cols <- purrr::map(material_pals, ~ ggsci::pal_material(.x)(10)) %>%
  purrr::reduce(cbind)
 colnames(cols) <- material_pals
 cols %>%
  as_tibble() %>%
  mutate(idx = as.factor(1:10)) %>%
  pivot_longer(-idx, names_to = "var", values_to = "val") %>%
  mutate(var = factor(var, levels = material_pals[19:1])) %>%
  ggplot() +
  xlab(NULL) +
  ylab(NULL) +
  geom_tile(aes(x = idx, y = var, fill = val)) +
  scale_fill_identity() +
  scale_x_discrete(expand = c(0, 0)) +
  scale_y_discrete(expand = c(0, 0)) +
  theme_minimal() -> plot_cols
--- a/25_07_phd_defense/assets/library.bib
+++ b/25_07_phd_defense/assets/library.bib
@@ -14,6 +14,16 @@
  booktitle = {Oxford Research Encyclopedia of Economics and Finance},
  year      = {2019}
 }
@article{marcjasz2022distributional,
  title     = {Distributional neural networks for electricity price forecasting},
  author    = {Marcjasz, Grzegorz and Narajewski, Micha{\l} and Weron, Rafa{\l} and Ziel, Florian},
  journal   = {Energy Economics},
  volume    = {125},
  pages     = {106843},
  year      = {2023},
  doi       = {10.1016/j.eneco.2023.106843},
  publisher = {Elsevier}
 }
@article{atiya2020does,
  title     = {Why does forecast combination work so well?},
  author    = {Atiya, Amir F},
--- a/25_07_phd_defense/index.qmd
+++ b/25_07_phd_defense/index.qmd
@@ -1428,10 +1428,12 @@ weights %>%
 ::: {.column width="48%"}
 Potential Downsides:
 - Pointwise optimization can induce quantile crossing
  - Can be solved by sorting the predictions
 Upsides:
 - Pointwise learning outperforms the Naive solution significantly
 - Online learning is much faster than batch methods
 - Smoothing further improves the predictive performance
@@ -1464,7 +1466,7 @@ The [`r fontawesome::fa("github")` profoc](https://profoc.berrisch.biz/) R Packa
 ::::
-:::: {.notes}
+<!-- :::: {.notes}
 Execution Times:
@@ -1478,7 +1480,619 @@ Boa     > 212 ms
 Profoc:
 Ml-Poly > 17
-BOA     > 16
+BOA     > 16 -->
 # Multivariate Probabilistic CRPS Learning with an Application to Day-Ahead Electricity Prices
 ---
 ## Outline
 ```{r, include=FALSE}
 col_lightgray <- "#e7e7e7"
 col_blue <- "#000088"
 col_smooth_expost <- "#a7008b"
 col_smooth <- "#187a00"
 col_pointwise <- "#008790"
 col_constant <- "#dd9002"
 col_optimum <- "#666666"
 col_green <- "#61B94C"
 col_orange <- "#ffa600"
 col_yellow <- "#FCE135"
 ```
 </br>
 **Multivariate CRPS Learning**
 - Introduction
 - Smoothing procedures
 - Application to multivariate electricity price forecasts
 **The `profoc` R package**
 - Package overview
 - Implementation details
 - Illustrative examples
 ## The Framework of Prediction under Expert Advice
 ### The sequential framework
 :::: {.columns}
 ::: {.column width="48%"}
 Each day, $t = 1, 2, ... T$
 - The **forecaster** receives predictions $\widehat{X}_{t,k}$ from $K$ **experts**
 - The **forecaster** assings weights $w_{t,k}$ to each **expert**
 - The **forecaster** calculates her prediction:
 $$\widetilde{X}_{t}=\sum_{k=1}^K w_{t,k}\widehat{X}_{t,k}$$
 - The realization for $t$ is observed
 :::
 ::: {.column width="2%"}
 :::
 ::: {.column width="48%"}
 - The experts can be institutions, persons, or models
 - The forecasts can be point-forecasts (i.e., mean or median) or full predictive distributions
 - We do not need any assumptions concerning the underlying data
 - `r Citet(my_bib, "cesa2006prediction")`
 :::
 ::::
 ## The Regret
 Weights are updated sequentially according to the past performance of the $K$ experts.
 `r fontawesome::fa("arrow-right", fill ="#000000")` A loss function $\ell$ is needed (to compute the **cumulative regret** $R_{t,k}$)
 \begin{equation}
    R_{t,k}  = \widetilde{L}_{t} - \widehat{L}_{t,k} =  \sum_{i = 1}^t \ell(\widetilde{X}_{i},Y_i) - \ell(\widehat{X}_{i,k},Y_i)
    \label{eq_regret}
 \end{equation}
 The cumulative regret:
 - Indicates the predictive accuracy of expert $k$ until time $t$.
 - Measures how much the forecaster *regrets* not having followed the expert's advice
 Popular loss functions for point forecasting `r Citet(my_bib, "gneiting2011making")`:
 :::: {.columns}
 ::: {.column width="48%"}
 - $\ell_2$-loss $\ell_2(x, y) = | x -y|^2$
  - optimal for mean prediction 
 :::
 ::: {.column width="2%"}
 :::
 ::: {.column width="48%"}
 - $\ell_1$-loss $\ell_1(x, y) = | x -y|$ 
  - optimal for median predictions 
 :::
 ::::
 ---
 :::: {.columns}
 ::: {.column width="48%"}
 ### Probabilistic Setting
 An appropriate loss:
 \begin{align*}
    \text{CRPS}(F, y) & = \int_{\mathbb{R}} {(F(x) - \mathbb{1}\{ x > y \})}^2 dx
    \label{eq_crps}
 \end{align*}
 It's strictly proper `r Citet(my_bib, "gneiting2007strictly")`.
 Using the CRPS, we can calculate time-adaptive weights $w_{t,k}$. However, what if the experts' performance varies in parts of the distribution? 
 `r fontawesome::fa("lightbulb", fill = col_yellow)` Utilize this relation:
 \begin{align*}
    \text{CRPS}(F, y) = 2 \int_0^{1}  \text{QL}_p(F^{-1}(p), y) \, d p.
    \label{eq_crps_qs}
 \end{align*}
 ... to combine quantiles of the probabilistic forecasts individually using the quantile-loss QL.
 :::
 ::: {.column width="2%"}
 :::
 ::: {.column width="48%"}
 ### Optimal Convergence
 </br>
 `r fontawesome::fa("exclamation", fill = col_orange)` exp-concavity of the loss is required for *selection* and *convex aggregation* properties 
 `r fontawesome::fa("exclamation", fill = col_orange)` QL is convex, but not exp-concave 
 `r fontawesome::fa("arrow-right", fill ="#000000")` The Bernstein Online Aggregation (BOA) lets us weaken the exp-concavity condition.
 Convergence rates of BOA are:
 `r fontawesome::fa("arrow-right", fill ="#000000")` Almost optimal w.r.t *selection* `r Citet(my_bib, "gaillard2018efficient")`.
 `r fontawesome::fa("arrow-right", fill ="#000000")` Almost optimal w.r.t *convex aggregation* `r Citet(my_bib, "wintenberger2017optimal")`.
 :::
 ::::
 ## Multivariate CRPS Learning
 :::: {.columns}
 ::: {.column width="48%"}
 Additionally, we extend the **B-Smooth** and **P-Smooth** procedures to the multivariate setting:
 - Basis matrices for reducing 
 - - the probabilistic dimension from $P$ to $\widetilde P$
 - - the multivariate dimension from $D$ to $\widetilde D$
 - Hat matrices
 - - penalized smoothing across P and D dimensions
 We utilize the mean Pinball Score over the entire space for hyperparameter optimization (e.g, $\lambda$)
 :::
 ::: {.column width="2%"}
 :::
 ::: {.column width="48%"}
 *Basis Smoothing*
 Represent weights as linear combinations of bounded basis functions:
 \begin{equation}
  \underbrace{\boldsymbol w_{t,k}}_{D \text{ x } P} = \sum_{j=1}^{\widetilde D} \sum_{l=1}^{\widetilde P} \beta_{t,j,l,k} \varphi^{\text{mv}}_{j} \varphi^{\text{pr}}_{l} = \underbrace{\boldsymbol \varphi^{\text{mv}}}_{D\text{ x }\widetilde D} \boldsymbol \beta_{t,k} \underbrace{{\boldsymbol\varphi^{\text{pr}}}'}_{\widetilde P \text{ x }P} \nonumber
 \end{equation}
 A popular choice: B-Splines
 $\boldsymbol \beta_{t,k}$ is calculated using a reduced regret matrix:
 $\underbrace{\boldsymbol r_{t,k}}_{\widetilde P \times \widetilde D} = \boldsymbol \varphi^{\text{pr}} \underbrace{\left({\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widetilde{\boldsymbol X}_{t},Y_t)- {\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widehat{\boldsymbol X}_{t},Y_t)\right)}_{\text{PxD}}\boldsymbol \varphi^{\text{mv}}$
 If $\widetilde P = P$ it holds that $\boldsymbol \varphi^{pr} = \boldsymbol{I}$  (pointwise)
 For $\widetilde P = 1$ we receive constant weights
 :::
 ::::
 ## Multivariate CRPS Learning
 :::: {.columns}
 ::: {.column width="48%"}
 **Penalized smoothing:**
 Let $\boldsymbol{\psi}^{\text{mv}}=(\psi_1,\ldots, \psi_{D})$ and $\boldsymbol{\psi}^{\text{pr}}=(\psi_1,\ldots, \psi_{P})$ be two sets of bounded basis functions on $(0,1)$:
 \begin{equation}
  \boldsymbol w_{t,k} = \boldsymbol{\psi}^{\text{mv}} \boldsymbol{b}_{t,k} {\boldsymbol{\psi}^{pr}}'
 \end{equation}
 with parameter matix $\boldsymbol b_{t,k}$. The latter is estimated to penalize $L_2$-smoothing which minimizes
 \begin{align}
   & \| \boldsymbol{\beta}_{t,d, k}' \boldsymbol{\varphi}^{\text{pr}}  - \boldsymbol b_{t, d, k}' \boldsymbol{\psi}^{\text{pr}}  \|^2_2 + \lambda^{\text{pr}}  \| \mathcal{D}_{q}  (\boldsymbol b_{t, d, k}' \boldsymbol{\psi}^{\text{pr}})  \|^2_2 +                       \nonumber \\
   & \| \boldsymbol{\beta}_{t, p, k}' \boldsymbol{\varphi}^{\text{mv}}  - \boldsymbol b_{t, p, k}' \boldsymbol{\psi}^{\text{mv}}  \|^2_2 + \lambda^{\text{mv}}  \| \mathcal{D}_{q}  (\boldsymbol b_{t, p, k}' \boldsymbol{\psi}^{\text{mv}})  \|^2_2  \nonumber
 \end{align}
 with differential operator $\mathcal{D}_q$ of order $q$
 Computation is easy since we have an analytical solution.
 :::
 ::: {.column width="2%"}
 :::
 ::: {.column width="48%"}
 ```{r, fig.align="center", echo=FALSE, out.width = "1000px"}
 knitr::include_graphics("assets/mcrps_learning/algorithm.svg")
 ```
 :::
 ::::
 ## Application
 :::: {.columns}
 ::: {.column width="48%"}
 #### Data
 - Day-Ahead electricity price forecasts from `r Citet(my_bib, "marcjasz2022distributional")`
 - Produced using probabilistic neural networks
 - 24-dimensional distributional forecasts
 - Distribution assumptions: JSU and Normal
 - 8 experts (4 JSU, 4 Normal)
 - 27th Dec. 2018 to 31st Dec. 2020 (736 days)
 - We extract 99 quantiles (percentiles)
 :::
 ::: {.column width="2%"}
 :::
 ::: {.column width="48%"}
 #### Setup
 Evaluation: Exclude first 182 observations
 Extensions: Penalized smoothing | Forgetting 
 Tuning strategies:
 - Bayesian Fix
  - Sophisticated Baesian Search algorithm
 - Online
  - Dynamic based on past performance
 - Bayesian Online
  - First Bayesian Fix then Online
 Computation Time: ~30 Minutes
 :::
 ::::
 # Special Cases 
 :::: {.columns}
 ::: {.column width="48%"}
 ::: {.panel-tabset}
 ## Constant
 ```{r, fig.align="center", echo=FALSE, out.width = "400"}
 knitr::include_graphics("assets/mcrps_learning/constant.svg")
 ```
 ## Constant PR
 ```{r, fig.align="center", echo=FALSE, out.width = "400"}
 knitr::include_graphics("assets/mcrps_learning/constant_pr.svg")
 ```
 ## Constant MV
 ```{r, fig.align="center", echo=FALSE, out.width = "400"}
 knitr::include_graphics("assets/mcrps_learning/constant_mv.svg")
 ```
 ::::
 :::
 ::: {.column width="2%"}
 :::
 ::: {.column width="48%"}
 ::: {.panel-tabset}
 ## Pointwise
 ```{r, fig.align="center", echo=FALSE, out.width = "400"}
 knitr::include_graphics("assets/mcrps_learning/pointwise.svg")
 ```
 ## Smooth
 ```{r, fig.align="center", echo=FALSE, out.width = "400"}
 knitr::include_graphics("assets/mcrps_learning/smooth_best.svg")
 ```
 ::::
 :::
 ::::
 ## Results
 ```{r, fig.align="center", echo=FALSE, out.width = "400"}
 knitr::include_graphics("assets/mcrps_learning/tab_performance_sa.svg")
 ```
 ## Results
 ```{r, warning=FALSE, fig.align="center", echo=FALSE, fig.width=12, fig.height=6}
 load("assets/mcrps_learning/pars_data.rds")
 pars_data %>%
    ggplot(aes(x = dates, y = value)) +
    geom_rect(aes(
        ymin = 0,
        ymax = value * 1.2,
        xmin = dates[1],
        xmax = dates[182],
        fill = "Burn-In"
    )) +
    geom_line(aes(color = name), linewidth = linesize, show.legend = FALSE) +
    scale_colour_manual(
        values = as.character(cols[5, c("pink", "amber", "green")])
    ) +
    facet_grid(name ~ .,
        scales = "free_y",
        # switch = "both"
    ) +
    scale_y_continuous(
        trans = "log2",
        labels = scaleFUN
    ) +
    theme_minimal() +
    theme(
        # plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"),
        text = element_text(size = text_size),
        legend.key.width = unit(0.9, "inch"),
        legend.position = "none"
    ) +
    ylab(NULL) +
    xlab("date") +
    scale_fill_manual(NULL,
        values = as.character(cols[3, "grey"])
    )
 ```
 ## Results: Hour 16:00-17:00
 ```{r, fig.align="center", echo=FALSE, fig.width=12, fig.height=6}
 load("assets/mcrps_learning/weights_h.rds")
 weights_h %>%
        ggplot(aes(date, q, fill = weight)) +
    geom_raster(interpolate = TRUE) +
    facet_grid(
        Expert ~ . # , labeller = labeller(Mod = mod_labs)
    ) +
    theme_minimal() +
    theme(
        # plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"),
        text = element_text(size = text_size),
        legend.key.height = unit(0.9, "inch")
    ) +
    scale_x_date(expand = c(0, 0)) +
    scale_fill_gradientn(
        oob = scales::squish,
        limits = c(0, 1),
        values = c(seq(0, 0.4, length.out = 8), 0.65, 1),
        colours = c(
            cols[8, "red"],
            cols[5, "deep-orange"],
            cols[5, "amber"],
            cols[5, "yellow"],
            cols[5, "lime"],
            cols[5, "light-green"],
            cols[5, "green"],
            cols[7, "green"],
            cols[9, "green"],
            cols[10, "green"]
        ),
        breaks = seq(0, 1, 0.1)
    ) +
    xlab("date") +
    ylab("probability") +
    scale_y_continuous(breaks = c(0.1, 0.5, 0.9))
 ```
 ## Results: Median
 ```{r, fig.align="center", echo=FALSE, fig.width=12, fig.height=6}
 load("assets/mcrps_learning/weights_q.rds")
 weights_q %>%
    mutate(hour = as.numeric(hour) - 1) %>%
    ggplot(aes(date, hour, fill = weight)) +
    geom_raster(interpolate = TRUE) +
    facet_grid(
        Expert ~ . # , labeller = labeller(Mod = mod_labs)
    ) +
    theme_minimal() +
    theme(
        # plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"),
        text = element_text(size = text_size),
        legend.key.height = unit(0.9, "inch")
    ) +
    scale_x_date(expand = c(0, 0)) +
    scale_fill_gradientn(
        oob = scales::squish,
        limits = c(0, 1),
        values = c(seq(0, 0.4, length.out = 8), 0.65, 1),
        colours = c(
            cols[8, "red"],
            cols[5, "deep-orange"],
            cols[5, "amber"],
            cols[5, "yellow"],
            cols[5, "lime"],
            cols[5, "light-green"],
            cols[5, "green"],
            cols[7, "green"],
            cols[9, "green"],
            cols[10, "green"]
        ),
        breaks = seq(0, 1, 0.1)
    ) +
    xlab("date") +
    ylab("hour") +
    scale_y_continuous(breaks = c(0, 8, 16, 24))
 ```
 ## Profoc R Package
 :::: {.columns}
 ::: {.column width="48%"}
 ### Probabilistic Forecast Combination - profoc 
 Available on [Github](https://github.com/BerriJ/profoc) and [CRAN](https://CRAN.R-project.org/package=profoc)
 Main Function: `online()` for online learning.
 - Works with multivariate and/or probabilistic data
 - Implements BOA, ML-POLY, EWA (and the gradient versions)
 - Implements many extensions like smoothing, forgetting, thresholding, etc.
 - Various loss functions are available 
 - Various methods (`predict`, `update`, `plot`, etc.)
 :::
 ::: {.column width="2%"}
 :::
 ::: {.column width="48%"}
 ### Speed
 Large parts of profoc are implemented in C++.
 <center>
 <img src="assets/mcrps_learning/profoc_langs.png">
 </center>
 We use `Rcpp`, `RcppArmadillo`, and OpenMP.
 We use `Rcpp` modules to expose a class to R
 - Offers great flexibility for the end-user
 - Requires very little knowledge of C++ code
 - High-Level interface is easy to use
 :::
 ::::
 ## Profoc - B-Spline Basis
 :::: {.columns}
 ::: {.column width="48%"}
 Basis specification `b_smooth_pr` is internally passed to `make_basis_mats()`:
 ```{r, echo = TRUE, eval = FALSE, cache = FALSE}
 mod <- online(
  y = Y,
  experts = experts,
  tau = 1:99 / 100,
  b_smooth_pr = list(
    knots = 9,
    mu = 0.3, # NEW
    sigma = 1,
    nonc = 0,
    tailweight = 1,
    deg = 3
  )
 )
 ```
 Knots are distributed using the generalized beta distribution.
 TODO: Add actual algorithm to backup slides
 :::
 ::: {.column width="2%"}
 :::
 ::: {.column width="48%"}
 TODO
 :::
 ::::
 ## Wrap-Up
 :::: {.columns}
 ::: {.column width="48%"}
  The [`r fontawesome::fa("github")` profoc](https://profoc.berrisch.biz/) R Package:
 Profoc is a flexible framework for online learning.
 - It implements several algorithms
 - It implements several loss functions
 - It implements several extensions
 - Its high- and low-level interfaces offer great flexibility
 Profoc is fast.
 - The core components are written in C++
 - The core components utilize OpenMP for parallelization
 :::
 ::: {.column width="2%"}
 :::
 ::: {.column width="48%"}
 Multivariate Extension:
 - Code is available now
 - [Pre-Print](https://arxiv.org/abs/2303.10019) is available now
 Get these slides:
 <center>
 <img src="assets/mcrps_learning/web_pres.png">
 </center>
 [https://berrisch.biz/slides/23_06_ecmi/](https://berrisch.biz/slides/23_06_ecmi/)
 :::
 ::::
 ## Columns Template
--- a/flake.nix
+++ b/flake.nix
@@ -79,6 +79,7 @@
          rPackages.gganimate
          rPackages.cowplot
          rPackages.xtable
          rPackages.ggsci
          rPackages.profoc
        ];