+## :::: {.columns} @@ -1151,8 +1149,6 @@ Strictly proper for *median* predictions ### Popular Aggregation Algorithms -
- #### The naive combination @@ -1178,15 +1174,18 @@ w_{t,k}^{\text{Naive}} = \frac{1}{K}\label{eq:naive_combination} ::: {.column width="48%"} -### Optimality +### Risk In stochastic settings, the cumulative Risk should be analyzed @wintenberger2017optimal: + \begin{align} &\underbrace{\widetilde{\mathcal{R}}_t = \sum_{i=1}^t \mathbb{E}[\ell(\widetilde{X}_{i},Y_i)|\mathcal{F}_{i-1}]}_{\text{Cumulative Risk of Forecaster}} \\ &\underbrace{\widehat{\mathcal{R}}_{t,k} = \sum_{i=1}^t \mathbb{E}[\ell(\widehat{X}_{i,k},Y_i)|\mathcal{F}_{i-1}]}_{\text{Cumulative Risk of Experts}} \label{eq_def_cumrisk} \end{align} + (7) expected loss of the algorithm (lower = better) + ::: :::: @@ -1256,12 +1255,15 @@ Algorithms can statisfy both \eqref{eq_optp_select} and \eqref{eq_optp_conv} dep
+#### Requirements: + EWA satisfies optimal selection convergence \eqref{eq_optp_select} in a deterministic setting if: -- Loss $\ell$ is exp-concave -- Learning-rate $\eta$ is chosen correctly + Loss $\ell$ is exp-concave -Those results can be converted to any stochastic setting @wintenberger2017optimal. + Learning-rate $\eta$ is chosen correctly + +Those results can be converted to *any* stochastic setting @wintenberger2017optimal. Optimal convex aggregation convergence \eqref{eq_optp_conv} can be satisfied by applying the kernel-trick: @@ -1283,7 +1285,7 @@ $\ell'$ is the subgradient of $\ell$ at forecast combination $\widetilde{X}$.
-**An appropriate choice:** +#### An appropriate choice: \begin{equation*} \text{CRPS}(F, y) = \int_{\mathbb{R}} {(F(x) - \mathbb{1}\{ x > y \})}^2 dx \label{eq:crps} @@ -1293,7 +1295,7 @@ It's strictly proper [@gneiting2007strictly]. Using the CRPS, we can calculate time-adaptive weights $w_{t,k}$. However, what if the experts' performance varies in parts of the distribution? - Utilize this relation: + Utilize this relation: \begin{equation*} \text{CRPS}(F, y) = 2 \int_0^{1} \text{QL}_p(F^{-1}(p), y) dp.\label{eq_crps_qs} @@ -1311,9 +1313,9 @@ Using the CRPS, we can calculate time-adaptive weights $w_{t,k}$. However, what ## Almost Optimal Convergence -:::: {style="font-size: 90%;"} +:::: {style="font-size: 85%;"} - QL is convex, but not exp-concave + QL is convex, but not exp-concave Bernstein Online Aggregation (BOA) lets us weaken the exp-concavity condition. It satisfies that there exist a $C>0$ such that for $x>0$ it holds that @@ -1323,6 +1325,8 @@ Using the CRPS, we can calculate time-adaptive weights $w_{t,k}$. However, what \label{eq_boa_opt_conv} \end{equation} +if the loss function is convex. + Almost optimal w.r.t. *convex aggregation* \eqref{eq_optp_conv} @wintenberger2017optimal. The same algorithm satisfies that there exist a $C>0$ such that for $x>0$ it holds that @@ -1333,7 +1337,7 @@ The same algorithm satisfies that there exist a $C>0$ such that for $x>0$ it hol \label{eq_boa_opt_select} \end{equation} -if $Y_t$ is bounded, the considered loss $\ell$ is convex, $G$-Lipschitz, and weak exp-concave in its first coordinate. +if the loss $\ell$ is $G$-Lipschitz and weak exp-concave in its first coordinate Almost optimal w.r.t. *selection* \eqref{eq_optp_select} @gaillard2018efficient. @@ -1341,14 +1345,14 @@ if $Y_t$ is bounded, the considered loss $\ell$ is convex, $G$-Lipschitz, and we ::: -## Conditions + Lemma +## Proposition + Conditions :::: {.columns} ::: {.column width="48%"} -**Lemma 1** +**Proposition 1: The Power of Flexibility** \begin{align} 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\min} @@ -1361,15 +1365,17 @@ if $Y_t$ is bounded, the considered loss $\ell$ is convex, $G$-Lipschitz, and we Pointwise can outperform constant procedures -QL is convex but not exp-concave: + $\text{QL}$ is convex: almost optimal convergence w.r.t. *convex aggregation* \eqref{eq_boa_opt_conv} - Almost optimal convergence w.r.t. *convex aggregation* \eqref{eq_boa_opt_conv} +For almost optimal congerence w.r.t. *selection* \eqref{eq_boa_opt_select} we need: -For almost optimal congerence w.r.t. *selection* \eqref{eq_boa_opt_select} we need to check **A1** and **A2**: +**A1: Lipschitz Continuity** -QL is Lipschitz continuous: +**A2: Weak Exp-Concavity** - **A1** holds +QL is Lipschitz continuous with $G=\max(p, 1-p)$: + + **A1** holds ::: @@ -1379,14 +1385,16 @@ QL is Lipschitz continuous: ::: {.column width="48%"} -**A1** +**A1: Lipschitz Continuity** For some $G>0$ it holds for all $x_1,x_2\in \mathbb{R}$ and $t>0$ that $$ | \ell(x_1, Y_t)-\ell(x_2, Y_t) | \leq G |x_1-x_2|$$ -**A2** For some $\alpha>0$, $\beta\in[0,1]$ it holds +**A2 Weak Exp-Concavity** + +For some $\alpha>0$, $\beta\in[0,1]$ it holds for all $x_1,x_2 \in \mathbb{R}$ and $t>0$ that \begin{align*} @@ -1397,7 +1405,7 @@ for all $x_1,x_2 \in \mathbb{R}$ and $t>0$ that \mathbb{E}\left[ \left. \left( \alpha(\ell'(x_1, Y_t)(x_1 - x_2))^{2}\right)^{1/\beta} \right|\mathcal{F}_{t-1}\right] \end{align*} - Almost optimal w.r.t. *selection* \eqref{eq_optp_select} @gaillard2018efficient. +If $\beta=1$ we get strong-convexity, which implies weak exp-concavity ::: @@ -1414,16 +1422,15 @@ Conditional quantile risk: $\mathcal{Q}_p(x) = \mathbb{E}[ \text{QL}_p(x, Y_t) | convexity properties of $\mathcal{Q}_p$ depend on the conditional distribution $Y_t|\mathcal{F}_{t-1}$. -**Proposition 1** +**Proposition 2** Let $Y$ be a univariate random variable with (Radon-Nikodym) $\nu$-density $f$, then for the second subderivative of the quantile risk $\mathcal{Q}_p(x) = \mathbb{E}[ \text{QL}_p(x, Y) ]$ of $Y$ it holds for all $p\in(0,1)$ that $\mathcal{Q}_p'' = f.$ -Additionally, if $f$ is a continuous Lebesgue-density with $f\geq\gamma>0$ for some constant $\gamma>0$ on its support $\text{spt}(f)$ then -is $\mathcal{Q}_p$ is $\gamma$-strongly convex. +Additionally, if $f$ is a continuous Lebesgue-density with $f\geq\gamma>0$ for some constant $\gamma>0$ on its support $\text{spt}(f)$ then $\mathcal{Q}_p$ is $\gamma$-strongly convex, which implies satisfaction of condition -Strong convexity with $\beta=1$ implies weak exp-concavity **A2** @gaillard2018efficient +**A2** with $\beta=1$ @gaillard2018efficient ::: @@ -1433,8 +1440,6 @@ Strong convexity with $\beta=1$ implies weak exp-concavity **A2** **A1** and **A2** give us almost optimal convergence w.r.t. selection \eqref{eq_boa_opt_select} - **Theorem 1** The gradient based fully adaptive Bernstein online aggregation (BOAG) applied pointwise for all $p\in(0,1)$ on $\text{QL}$ satisfies @@ -1444,9 +1449,11 @@ $$\widehat{\mathcal{R}}_{t,\pi} = 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_ If $Y_t|\mathcal{F}_{t-1}$ is bounded and has a pdf $f_t$ satifying $f_t>\gamma >0$ on its -support $\text{spt}(f_t)$ then \ref{eq_boa_opt_select} holds with $\beta=1$ and +support $\text{spt}(f_t)$ then \eqref{eq_boa_opt_select} holds with $\beta=1$ and -$$\widehat{\mathcal{R}}_{t,\min} = 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\min}$$. +$$\widehat{\mathcal{R}}_{t,\min} = 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\min}$$ + + BOAG with $\text{QL}$ satisfies \eqref{eq_boa_opt_conv} and \eqref{eq_boa_opt_select} ::: @@ -1454,12 +1461,6 @@ $$\widehat{\mathcal{R}}_{t,\min} = 2\overline{\widehat{\mathcal{R}}}^{\text{QL}} :::: -:::: {.notes} - -We apply Bernstein Online Aggregation (BOA). It lets us weaken the exp-concavity condition while almost keeping the optimalities \ref{eq_optp_select} and \ref{eq_optp_conv}. - -:::: - ## A Probabilistic Example @@ -1922,7 +1923,7 @@ load("assets/crps_learning/overview_data.rds") data %>% ggplot(aes(x = Date, y = value)) + - geom_line(size = 1, col = cols[9,"blue"]) + + geom_line(size = 1, col = cols[9, "blue"]) + theme_minimal() + ylab("Value") + facet_wrap(. ~ name, scales = "free", ncol = 1) + @@ -1934,12 +1935,12 @@ data %>% data %>% ggplot(aes(x = value)) + - geom_histogram(aes(y = ..density..), size = 1, fill = cols[9,"blue"], bins = 50) + + geom_histogram(aes(y = ..density..), size = 1, fill = cols[9, "blue"], bins = 50) + ylab("Density") + xlab("Value") + theme_minimal() + theme( - strip.background = element_rect(fill = cols[3,"grey"], colour = cols[3,"grey"]), + strip.background = element_rect(fill = cols[3, "grey"], colour = cols[3, "grey"]), text = element_text(size = text_size) ) + facet_wrap(. ~ name, scales = "free", ncol = 1, strip.position = "right") -> p2 @@ -2054,15 +2055,16 @@ for (i.p in 1:MO) { table_col[, i.p] <- rgb(fred, fgreen, fblue, maxColorValue = 1) table[, i.p] <- paste0( table[, i.p], - '(', Pallout[i.p], ")" + "(", Pallout[i.p], ")" ) } table_out <- kbl( - table, - align = rep("c", ncol(table)), + table, + align = rep("c", ncol(table)), bootstrap_options = c("condensed"), - escape = FALSE) %>% + escape = FALSE +) %>% kable_paper(full_width = TRUE) %>% row_spec(0:nrow(table), color = cols[9, "grey"]) @@ -2187,7 +2189,7 @@ t(RQL) %>% ) + xlab("Probability p") + scale_color_manual(NULL, values = tCOL) + - guides(colour = guide_legend(nrow = 2, byrow = TRUE)) + guides(colour = guide_legend(nrow = 2, byrow = TRUE)) ``` ## Cumulative Loss Difference @@ -2407,38 +2409,38 @@ Computation Time: ~30 Minutes load("assets/mcrps_learning/naive_table_df.rds") table_naive <- naive_table_df %>% - as_tibble() %>% - head(1) %>% - mutate_all(round, 4) %>% - mutate_all(sprintf, fmt = "%#.3f") %>% - kbl( - bootstrap_options = "condensed", - escape = FALSE, - format = "html", - booktabs = FALSE, - align = c("c", rep("c", ncol(naive_table_df) - 1)) - ) %>% - kable_paper(full_width = TRUE) %>% - row_spec(0:1, color = cols[9, "grey"]) %>% - kable_styling(font_size = 16) + as_tibble() %>% + head(1) %>% + mutate_all(round, 4) %>% + mutate_all(sprintf, fmt = "%#.3f") %>% + kbl( + bootstrap_options = "condensed", + escape = FALSE, + format = "html", + booktabs = FALSE, + align = c("c", rep("c", ncol(naive_table_df) - 1)) + ) %>% + kable_paper(full_width = TRUE) %>% + row_spec(0:1, color = cols[9, "grey"]) %>% + kable_styling(font_size = 16) for (i in 1:ncol(naive_table_df)) { - table_naive <- table_naive %>% - column_spec(i, - background = ifelse( - is.na(naive_table_df["stat", i, drop = TRUE][-ncol(naive_table_df)]), - cols[5, "grey"], - col_scale2( - naive_table_df["stat", i, drop = TRUE][-ncol(naive_table_df)], - rng_t - ) - ), - bold = i == which.min(naive_table_df["loss", ]) + table_naive <- table_naive %>% + column_spec(i, + background = ifelse( + is.na(naive_table_df["stat", i, drop = TRUE][-ncol(naive_table_df)]), + cols[5, "grey"], + col_scale2( + naive_table_df["stat", i, drop = TRUE][-ncol(naive_table_df)], + rng_t ) + ), + bold = i == which.min(naive_table_df["loss", ]) + ) } -table_naive +table_naive load("assets/mcrps_learning/performance_data.rds") i <- 1 @@ -2447,7 +2449,7 @@ for (j in 1:3) { for (i in seq_len(nrow(performance_loss_tibble))) { if (loss_and_dm[i, j, "p.val"] < 0.001) { performance_loss_tibble[i, 2 + j] <- paste0( - performance_loss_tibble[i, 2 + j], + performance_loss_tibble[i, 2 + j], '***' ) } else if (loss_and_dm[i, j, "p.val"] < 0.01) { @@ -2457,7 +2459,7 @@ for (j in 1:3) { ) } else if (loss_and_dm[i, j, "p.val"] < 0.05) { performance_loss_tibble[i, 2 + j] <- paste0( - performance_loss_tibble[i, 2 + j], + performance_loss_tibble[i, 2 + j], '*' ) } else if (loss_and_dm[i, j, "p.val"] < 0.1) { @@ -2475,19 +2477,19 @@ for (j in 1:3) { table_performance <- performance_loss_tibble %>% kbl( - padding=-1L, + padding = -1L, col.names = c( - 'Description', - 'Parameter Tuning', - 'BOA', - 'ML-Poly', - 'EWA' + "Description", + "Parameter Tuning", + "BOA", + "ML-Poly", + "EWA" ), bootstrap_options = "condensed", # Dont replace any string, dataframe has to be valid latex code ... escape = FALSE, format = "html", - align = c("l", "l", rep("c", ncol(performance_loss_tibble)-2)) + align = c("l", "l", rep("c", ncol(performance_loss_tibble) - 2)) ) %>% kable_paper(full_width = TRUE) %>% row_spec(0:nrow(performance_loss_tibble), color = cols[9, "grey"]) @@ -2497,7 +2499,7 @@ table_performance <- performance_loss_tibble %>% for (i in 3:ncol(performance_loss_tibble)) { bold_cells <- rep(FALSE, times = nrow(performance_loss_tibble)) - loss <- loss_and_dm[, i - 2, "loss"] + loss <- loss_and_dm[, i - 2, "loss"] table_performance <- table_performance %>% column_spec(i, @@ -2588,38 +2590,38 @@ knitr::include_graphics("assets/mcrps_learning/smooth_best.svg") ```{r, warning=FALSE, fig.align="center", echo=FALSE, fig.width=12, fig.height=5.5, cache = TRUE} load("assets/mcrps_learning/pars_data.rds") pars_data %>% - ggplot(aes(x = dates, y = value)) + - geom_rect(aes( - ymin = 0, - ymax = value * 1.2, - xmin = dates[1], - xmax = dates[182], - fill = "Burn-In" - )) + - geom_line(aes(color = name), linewidth = linesize, show.legend = FALSE) + - scale_colour_manual( - values = as.character(cols[5, c("pink", "amber", "green")]) - ) + - facet_grid(name ~ ., - scales = "free_y", - # switch = "both" - ) + - scale_y_continuous( - trans = "log2", - labels = scaleFUN - ) + - theme_minimal() + - theme( - # plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"), - text = element_text(size = text_size), - legend.key.width = unit(0.9, "inch"), - legend.position = "none" - ) + - ylab(NULL) + - xlab("date") + - scale_fill_manual(NULL, - values = as.character(cols[3, "grey"]) - ) + ggplot(aes(x = dates, y = value)) + + geom_rect(aes( + ymin = 0, + ymax = value * 1.2, + xmin = dates[1], + xmax = dates[182], + fill = "Burn-In" + )) + + geom_line(aes(color = name), linewidth = linesize, show.legend = FALSE) + + scale_colour_manual( + values = as.character(cols[5, c("pink", "amber", "green")]) + ) + + facet_grid(name ~ ., + scales = "free_y", + # switch = "both" + ) + + scale_y_continuous( + trans = "log2", + labels = scaleFUN + ) + + theme_minimal() + + theme( + # plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"), + text = element_text(size = text_size), + legend.key.width = unit(0.9, "inch"), + legend.position = "none" + ) + + ylab(NULL) + + xlab("date") + + scale_fill_manual(NULL, + values = as.character(cols[3, "grey"]) + ) ``` ## Weights: Hour 16:00-17:00 @@ -2627,39 +2629,39 @@ pars_data %>% ```{r, fig.align="center", echo=FALSE, fig.width=12, fig.height=5.5, cache = TRUE} load("assets/mcrps_learning/weights_h.rds") weights_h %>% - ggplot(aes(date, q, fill = weight)) + - geom_raster(interpolate = TRUE) + - facet_grid( - Expert ~ . # , labeller = labeller(Mod = mod_labs) - ) + - theme_minimal() + - theme( - # plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"), - text = element_text(size = text_size), - legend.key.height = unit(0.9, "inch") - ) + - scale_x_date(expand = c(0, 0)) + - scale_fill_gradientn( - oob = scales::squish, - limits = c(0, 1), - values = c(seq(0, 0.4, length.out = 8), 0.65, 1), - colours = c( - cols[8, "red"], - cols[5, "deep-orange"], - cols[5, "amber"], - cols[5, "yellow"], - cols[5, "lime"], - cols[5, "light-green"], - cols[5, "green"], - cols[7, "green"], - cols[9, "green"], - cols[10, "green"] - ), - breaks = seq(0, 1, 0.1) - ) + - xlab("date") + - ylab("probability") + - scale_y_continuous(breaks = c(0.1, 0.5, 0.9)) + ggplot(aes(date, q, fill = weight)) + + geom_raster(interpolate = TRUE) + + facet_grid( + Expert ~ . # , labeller = labeller(Mod = mod_labs) + ) + + theme_minimal() + + theme( + # plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"), + text = element_text(size = text_size), + legend.key.height = unit(0.9, "inch") + ) + + scale_x_date(expand = c(0, 0)) + + scale_fill_gradientn( + oob = scales::squish, + limits = c(0, 1), + values = c(seq(0, 0.4, length.out = 8), 0.65, 1), + colours = c( + cols[8, "red"], + cols[5, "deep-orange"], + cols[5, "amber"], + cols[5, "yellow"], + cols[5, "lime"], + cols[5, "light-green"], + cols[5, "green"], + cols[7, "green"], + cols[9, "green"], + cols[10, "green"] + ), + breaks = seq(0, 1, 0.1) + ) + + xlab("date") + + ylab("probability") + + scale_y_continuous(breaks = c(0.1, 0.5, 0.9)) ``` ## Weights: Median @@ -2667,40 +2669,40 @@ weights_h %>% ```{r, fig.align="center", echo=FALSE, fig.width=12, fig.height=5.5, cache = TRUE} load("assets/mcrps_learning/weights_q.rds") weights_q %>% - mutate(hour = as.numeric(hour) - 1) %>% - ggplot(aes(date, hour, fill = weight)) + - geom_raster(interpolate = TRUE) + - facet_grid( - Expert ~ . # , labeller = labeller(Mod = mod_labs) - ) + - theme_minimal() + - theme( - # plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"), - text = element_text(size = text_size), - legend.key.height = unit(0.9, "inch") - ) + - scale_x_date(expand = c(0, 0)) + - scale_fill_gradientn( - oob = scales::squish, - limits = c(0, 1), - values = c(seq(0, 0.4, length.out = 8), 0.65, 1), - colours = c( - cols[8, "red"], - cols[5, "deep-orange"], - cols[5, "amber"], - cols[5, "yellow"], - cols[5, "lime"], - cols[5, "light-green"], - cols[5, "green"], - cols[7, "green"], - cols[9, "green"], - cols[10, "green"] - ), - breaks = seq(0, 1, 0.1) - ) + - xlab("date") + - ylab("hour") + - scale_y_continuous(breaks = c(0, 8, 16, 24)) + mutate(hour = as.numeric(hour) - 1) %>% + ggplot(aes(date, hour, fill = weight)) + + geom_raster(interpolate = TRUE) + + facet_grid( + Expert ~ . # , labeller = labeller(Mod = mod_labs) + ) + + theme_minimal() + + theme( + # plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"), + text = element_text(size = text_size), + legend.key.height = unit(0.9, "inch") + ) + + scale_x_date(expand = c(0, 0)) + + scale_fill_gradientn( + oob = scales::squish, + limits = c(0, 1), + values = c(seq(0, 0.4, length.out = 8), 0.65, 1), + colours = c( + cols[8, "red"], + cols[5, "deep-orange"], + cols[5, "amber"], + cols[5, "yellow"], + cols[5, "lime"], + cols[5, "light-green"], + cols[5, "green"], + cols[7, "green"], + cols[9, "green"], + cols[10, "green"] + ), + breaks = seq(0, 1, 0.1) + ) + + xlab("date") + + ylab("hour") + + scale_y_continuous(breaks = c(0, 8, 16, 24)) ``` ::::