--- title: "Data Science Methods for Forecasting in Energy and Economics" date: 30 June 2025 author: - name: Jonathan Berrisch affiliations: - ref: hemf affiliations: - id: hemf name: University of Duisburg-Essen, House of Energy, Climate, and Finance format: revealjs: embed-resources: true self-contained-math: true width: 1280 height: 720 # footer: defined in assets/revealjs/custom.html # logo: defined in assets/revealjs/custom.html theme: [default, assets/revealjs/sydney.scss, assets/revealjs/custom.scss] smaller: true fig-format: svg slide-number: true crossrefs-hover: true pagetitle: "De-Fence" html-math-method: mathjax include-in-header: - file: assets/revealjs/custom.html execute: daemon: false highlight-style: github bibliography: assets/library.bib csl: assets/revealjs/apa-old-doi-prefix.csl # Find nice plugins: https://m.canouil.dev/quarto-extensions/ revealjs-plugins: - pointer # - drop --- # High-Level View {.center visibility="uncounted"} ::: {.hidden} $$ \newcommand{\A}{{\mathbb A}} $$ ::: ```{r, setup, include=FALSE} # Compile with: rmarkdown::render("crps_learning.Rmd") library(latex2exp) library(ggplot2) library(dplyr) library(tidyr) library(purrr) library(kableExtra) library(RefManageR) knitr::opts_chunk$set( dev = "svglite" # Use svg figures ) BibOptions( check.entries = TRUE, bib.style = "authoryear", cite.style = "authoryear", style = "html", hyperlink = TRUE, dashed = FALSE ) source("assets/01_common.R") col_lightgray <- "#e7e7e7" col_blue <- "#000088" col_smooth_expost <- "#a7008b" col_constant <- "#dd9002" col_optimum <- "#666666" col_smooth <- "#187a00" col_pointwise <- "#008790" col_green <- "#61B94C" col_orange <- "#ffa600" col_yellow <- "#FCE135" ``` ## The beginning: June 2020 ![Artwork by [\@allison_horst](https://allisonhorst.com/)](assets/allisonhorst/the_beginning_cropped.png) ## Motivation :::: {.columns} ::: {.column width="53%"}

	Energy market liberalization created complex, interconnected trading systems
	Renewable energy transition introduces uncertainty and volatility from weather-dependent generation
	Traditional point forecasts are inadequate for modern energy markets with increasing uncertainty
	Risk inherently is a probabilistic concept
	Probabilistic forecasting essential for risk management, planning and decision making in volatile energy environments
	Online learning methods needed for fast-updating models with streaming energy data

::: ::: {.column width="4%"} ::: ::: {.column width="43%"} ![Image generated by Sora (OpenAI)](assets/Renewable_Energy_Forecasting.png) ::: :::: ## Overview of the Thesis {#sec-overview} ::: {.r-stack} ::: {.fragment .fade-in-then-out}

	Berrisch, J., & Ziel, F. [-@BERRISCH2023105221]. CRPS learning. Journal of Econometrics, 237(2), 105221.
	Berrisch, J., & Ziel, F. [-@BERRISCH20241568]. Multivariate probabilistic CRPS learning with an application to day-ahead electricity prices. International Journal of Forecasting, 40(4), 1568–1586.
	Hirsch, S., Berrisch, J., & Ziel, F. [-@hirsch2024online]. Online Distributional Regression. arXiv preprint arXiv:2407.08750.
	Berrisch, J., & Ziel, F. [-@berrisch2022distributional]. Distributional modeling and forecasting of natural gas prices. Journal of Forecasting, 41(6), 1065–1086.
	Berrisch, J., Pappert, S., Ziel, F., & Arsova, A. [-@berrisch2023modeling]. Modeling volatility and dependence of European carbon and energy prices. Finance Research Letters, 52, 103503.
	Berrisch, J., Narajewski, M., & Ziel, F. [-@BERRISCH2023100236]. High-resolution peak demand estimation using generalized additive models and deep neural networks. Energy and AI, 13, 100236.
	Berrisch, J. [-@berrisch2025rcpptimer]. rcpptimer: Rcpp Tic-Toc Timer with OpenMP Support. arXiv preprint arXiv:2501.15856.

::: ::: {.fragment .fade-in-then-out}

	Berrisch, J., & Ziel, F. [-@BERRISCH2023105221]. CRPS learning. Journal of Econometrics, 237(2), 105221.
	Berrisch, J., & Ziel, F. [-@BERRISCH20241568]. Multivariate probabilistic CRPS learning with an application to day-ahead electricity prices. International Journal of Forecasting, 40(4), 1568–1586.
	Hirsch, S., Berrisch, J., & Ziel, F. [-@hirsch2024online]. Online Distributional Regression. arXiv preprint arXiv:2407.08750.
	Berrisch, J., & Ziel, F. [-@berrisch2022distributional]. Distributional modeling and forecasting of natural gas prices. Journal of Forecasting, 41(6), 1065–1086.
	Berrisch, J., Pappert, S., Ziel, F., & Arsova, A. [-@berrisch2023modeling]. Modeling volatility and dependence of European carbon and energy prices. Finance Research Letters, 52, 103503.
	Berrisch, J., Narajewski, M., & Ziel, F. [-@BERRISCH2023100236]. High-resolution peak demand estimation using generalized additive models and deep neural networks. Energy and AI, 13, 100236.
	Berrisch, J. [-@berrisch2025rcpptimer]. rcpptimer: Rcpp Tic-Toc Timer with OpenMP Support. arXiv preprint arXiv:2501.15856.

::: ::: {.fragment .fade-in-then-out}

	Berrisch, J., & Ziel, F. [-@BERRISCH2023105221]. CRPS learning. Journal of Econometrics, 237(2), 105221.
	Berrisch, J., & Ziel, F. [-@BERRISCH20241568]. Multivariate probabilistic CRPS learning with an application to day-ahead electricity prices. International Journal of Forecasting, 40(4), 1568–1586.
	Hirsch, S., Berrisch, J., & Ziel, F. [-@hirsch2024online]. Online Distributional Regression. arXiv preprint arXiv:2407.08750.
	Berrisch, J., & Ziel, F. [-@berrisch2022distributional]. Distributional modeling and forecasting of natural gas prices. Journal of Forecasting, 41(6), 1065–1086.
	Berrisch, J., Pappert, S., Ziel, F., & Arsova, A. [-@berrisch2023modeling]. Modeling volatility and dependence of European carbon and energy prices. Finance Research Letters, 52, 103503.
	Berrisch, J., Narajewski, M., & Ziel, F. [-@BERRISCH2023100236]. High-resolution peak demand estimation using generalized additive models and deep neural networks. Energy and AI, 13, 100236.
	Berrisch, J. [-@berrisch2025rcpptimer]. rcpptimer: Rcpp Tic-Toc Timer with OpenMP Support. arXiv preprint arXiv:2501.15856.

::: ::: ## Overview :::: {.columns} ::: {.column width="48%"} #### Online Distributional Regression ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} #### Distributional Modeling and Forecasting of Natural Gas Prices ::: :::: :::: {.columns} ::: {.column width="48%"}

Reduces estimation time by 2-3 orders of magnitude Maintainins competitive forecasting accuracy Real-World Validation in Energy Markets Python Package *ondil* on [Github](https://github.com/simon-hirsch/ondil) and [PyPi](https://pypi.org/project/ondil/) ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} Forecasting of Day-Ahead and Month-Ahead prices To capture the full distribution of future prices Extensive data analysis from 2011-2020 State-Space models, skewed Student's *t* distribution Python Package *sstudentt* on [Github](https://github.com/BerriJ/sstudentt) and [PyPi](https://pypi.org/project/sstudentt/) ```{r, echo=FALSE, fig.width = 12, fig.height = 6, fig.align="left"} load("assets/ngas/residuals.RData") clr_day_ahead <- cols[5, "green"] clr_month_ahead <- cols[5, "blue"] line_da <- da_data %>% ggplot(aes(y = sresids, x = obs)) + geom_point(col = clr_day_ahead) + geom_line(col = clr_day_ahead) + theme_minimal() + xlab(NULL) + ylab(expression(z[t])) + theme(text = element_text(size = text_size)) acf_da <- forecast::ggAcf(da_data$sresids, size = 1, col = clr_day_ahead) + theme_minimal() + ggtitle("") + theme(text = element_text(size = text_size)) # Density Plot DA den_da <- da_data %>% ggplot(aes(x = sresids)) + geom_histogram(aes(y = ..density..), fill = clr_day_ahead) + geom_density(aes(y = ..density..), size = 1) + theme_minimal() + ylab("Density") + xlab(NULL) + theme(text = element_text(size = text_size)) line_ma <- ma_data %>% ggplot(aes(y = sresids, x = obs)) + geom_point(col = clr_month_ahead) + geom_line(col = clr_month_ahead) + theme_minimal() + xlab(NULL) + ylab(expression(z[t])) + theme(text = element_text(size = text_size)) acf_ma <- forecast::ggAcf(ma_data$sresids, size = 1, col = clr_month_ahead) + theme_minimal() + ggtitle("") + theme(text = element_text(size = text_size)) den_ma <- ma_data %>% ggplot(aes(x = sresids)) + geom_histogram(aes(y = after_stat(density)), fill = clr_month_ahead) + geom_density(aes(y = after_stat(density)), size = 1) + theme_minimal() + ylab("Density") + xlab(NULL) + theme(text = element_text(size = text_size)) plots_da <- cowplot::align_plots(line_da, acf_da, line_ma, acf_ma, align = "v", axis = "l") bottom_row_da <- cowplot::plot_grid(plots_da[[2]], den_da, align = "hv") plots_ma <- cowplot::align_plots(line_ma, acf_ma, line_ma, acf_ma, align = "v", axis = "l") bottom_row_ma <- cowplot::plot_grid(plots_ma[[2]], den_ma, align = "hv") cols_ngas <- c( clr_day_ahead, clr_day_ahead, clr_day_ahead, clr_day_ahead, clr_month_ahead, clr_month_ahead, clr_month_ahead, clr_month_ahead ) pacfs <- map2(pacfs, cols_ngas, .f = ~ .x %>% mutate(Col = .y)) %>% purrr::reduce(.f = rbind) var_labs <- c( "Residuals", "Absolute", "Positive", "Negative", "Residuals", "Absolute", "Positive", "Negative" ) lvls <- unique(pacfs$Var) names(var_labs) <- lvls pacfs <- transform(pacfs, Var = factor(Var, levels = lvls, labels = c( "paste(Residuals, ':',~ z[t])", "paste(Absolute, ':',~ '|', z[t] ,'|')", "paste(Positive, ':',~ '(', z[t],')'^'+')", "paste(Negative, ':',~ '(', z[t],')'^'-')" ) ) ) acfs <- pacfs %>% dplyr::filter(Lag != 0) %>% ggplot(aes(x = Lag)) + geom_linerange(aes(ymin = ymin, ymax = ymax, color = Col), size = 1) + geom_line(aes(y = upper), linetype = "longdash", alpha = 0.5) + geom_line(aes(y = lower), linetype = "longdash", alpha = 0.5) + scale_color_identity() + ylab("ACF") + theme_minimal() + facet_grid(Product ~ Var, labeller = label_parsed) + theme( plot.margin = unit(c(0, 0, 0, 0.2), "cm"), text = element_text(size = text_size), strip.background = element_rect(fill = "grey95", colour = "grey95") ) acfs_da <- pacfs %>% dplyr::filter(Lag != 0 & Product == "Day-Ahead") %>% ggplot(aes(x = Lag)) + geom_linerange(aes(ymin = ymin, ymax = ymax, color = Col), size = 1) + geom_line(aes(y = upper), linetype = "longdash", alpha = 0.5) + geom_line(aes(y = lower), linetype = "longdash", alpha = 0.5) + scale_color_identity() + ylab("PACF") + theme_minimal() + facet_grid(. ~ Var, labeller = label_parsed) + theme( plot.margin = unit(c(0, 0, 0, 0.2), "cm"), text = element_text(size = text_size), strip.background = element_rect( fill = "grey95", colour = "grey95" ), axis.title.x = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank() ) acfs_ma <- pacfs %>% dplyr::filter(Lag != 0 & Product == "Month-Ahead") %>% ggplot(aes(x = Lag)) + geom_linerange(aes(ymin = ymin, ymax = ymax, color = Col), size = 1) + geom_line(aes(y = upper), linetype = "longdash", alpha = 0.5) + geom_line(aes(y = lower), linetype = "longdash", alpha = 0.5) + scale_color_identity() + ylab("PACF") + theme_minimal() + scale_x_continuous(breaks = c(1, 10, 20, 30, 40)) + facet_grid(. ~ Var, labeller = label_parsed) + theme( plot.margin = unit(c(0, 0, 0, 0.2), "cm"), text = element_text(size = text_size), strip.background = element_blank(), strip.text = element_blank() ) plots_da <- cowplot::align_plots(acfs_da, line_da, line_ma, acf_ma, align = "hv", axis = "l") bottom_row_da <- cowplot::plot_grid( plots_da[[2]], den_da, align = "hv", rel_widths = c(0.66, 0.33) ) da_plots <- cowplot::plot_grid(plots_da[[1]], bottom_row_da, nrow = 2) plots_ma <- cowplot::align_plots(acfs_ma, line_ma, line_ma, acf_ma, align = "hv", axis = "l") bottom_row_ma <- cowplot::plot_grid( plots_ma[[2]], den_ma, align = "hv", rel_widths = c(0.66, 0.33) ) cowplot::plot_grid( bottom_row_da, bottom_row_ma, acfs_da, acfs_ma, ncol = 1, rel_heights = c(0.2, 0.2, 0.3, 0.3) ) ``` ::: :::: ## Overview :::: {.columns} ::: {.column width="48%"} #### High-Resolution Peak Demand Estimation Using Generalized Additive Models and Deep Neural Networks ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} #### rcpptimer: Rcpp Tic-Toc Timer with OpenMP Support ::: :::: :::: {.columns} ::: {.column width="48%"} Predict high-resolution electricity peaks using only low-resolution data Combine GAMs and DNN's for superior accuracy Won Western Power Distribution Competition Won Best-Student-Presentation Award ```{r, echo=FALSE, fig.width = 12, fig.height = 6, fig.align="center"} load("assets/minmaxload/plot_overview.rds") # linesize <- 1.2 ggplot() + geom_line( data = plot_df[plot_df$var == "1high_res_load", ], aes( x = time, y = value, colour = var ), linewidth = linesize ) + geom_line( data = plot_df[plot_df$var == "2low_res", ], aes( x = time, y = value, colour = var ), linewidth = linesize ) + geom_line( data = plot_df[plot_df$var == "4max", ], aes( x = time, y = value, colour = var ), linewidth = linesize ) + geom_line( data = plot_df[plot_df$var == "3min", ], aes( x = time, y = value, colour = var ), linewidth = linesize ) + scale_color_manual( values = as.character(c( cols[4, col_load], cols[9, col_load], cols[8, col_min], cols[8, col_max] )), labels = c( "High-Resolution Load", "Low-Resolution Load", "Minimum", "Maximum" ) ) + guides(color = guide_legend( override.aes = list(size = 2) )) + theme_minimal() + ylab("Load [MW]") + xlab("Time") + theme( zoom.x = element_rect(fill = cols[4, "grey"], colour = NA), legend.position = "bottom", legend.title = element_blank(), text = element_text( size = text_size + 4, ), validate = FALSE ) + scale_y_continuous(breaks = seq(-1.5, 1.0, 0.5)) + ggforce::facet_zoom( xlim = c( as.POSIXct("2021-07-24 12:00:00", tz = "UTC"), as.POSIXct("2021-07-24 19:00:00", tz = "UTC") ), zoom.size = 2, ylim = c(-.9, 1), split = FALSE, horizontal = FALSE, show.area = TRUE ) ``` ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} R Package *rcpptimer* on [Github](https://github.com/BerriJ/rcpptimer) and [CRAN](https://cran.r-project.org/web/packages/rcpptimer/) Provides Rcpp bindings for [cpptimer](https://github.com/BerriJ/rcpptimer) tic-toc class for timing C++ code Supports nested, overlapping and scoped timers Supports OpenMP parallelism ```c++ //[[Rcpp::depends(rcpptimer)]] #include void main(){ Rcpp::Timer timer; timer.tic(); // Code to be timed timer.toc(); } ``` ::: :::: # CRPS Learning {#sec-crps-learning visibility="uncounted"} Berrisch, J., & Ziel, F. [-@BERRISCH2023105221]. *Journal of Econometrics*, 237(2), 105221. ## Introduction :::: {.columns} ::: {.column width="48%"} ### The Idea: - Combine multiple forecasts instead of choosing one - Combination weights may vary over **time**, over the **distribution** or **both** 2 Popular options for combining distributions: - Combining across quantiles (this paper) - Horizontal aggregation, vincentization - Combining across probabilities - Vertical aggregation - Combining at an angle - @taylor2023angular ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} ::: {.panel-tabset} ## Time ```{r, echo = FALSE, fig.height=8, cache = TRUE} par(mfrow = c(3, 3), mar = c(2, 2, 2, 2)) set.seed(1) # Data X <- matrix(ncol = 3, nrow = 15) X[, 1] <- seq(from = 8, to = 12, length.out = 15) + 0.25 * rnorm(15) X[, 2] <- 10 + 0.25 * rnorm(15) X[, 3] <- seq(from = 12, to = 8, length.out = 15) + 0.25 * rnorm(15) # Weights w <- matrix(ncol = 3, nrow = 15) w[, 1] <- sin(0.1 * 1:15) w[, 2] <- cos(0.1 * 1:15) w[, 3] <- seq(from = -2, 0.25, length.out = 15)^2 w <- (w / rowSums(w)) # Vis plot(X[, 1], lwd = 4, type = "l", ylim = c(8, 12), xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#80C684FF" ) plot(w[, 1], lwd = 4, type = "l", ylim = c(0, 1), xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#80C684FF" ) text(6, 0.5, TeX("$w_1(t)$"), cex = 2, col = "#80C684FF") arrows(13, 0.25, 15, 0.0, , lwd = 4, bty = "n", col = "#414141FF") plot.new() plot.new() plot.new() text(6, 0.6, TeX("$w_2(t)$"), cex = 2, col = "#FFD44EFF") arrows(13, 0.5, 15, 0.5, , lwd = 4, bty = "n", col = "#414141FF") plot(rowSums(X * w), lwd = 4, type = "l", xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#D81A5FFF") plot(X[, 3], lwd = 4, type = "l", ylim = c(8, 12), xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#FFD44EFF" ) plot(w[, 3], lwd = 4, type = "l", ylim = c(0, 1), xlab = "", ylab = "", xaxt = "n", yaxt = "n", bty = "n", col = "#FFD44EFF" ) text(6, 0.25, TeX("$w_2(t)$"), cex = 2, col = "#FFD44EFF") arrows(13, 0.75, 15, 1, , lwd = 4, bty = "n", col = "#414141FF") ``` ## Distribution ```{ojs} d3 = require("d3@7") ``` ```{ojs} cdf_data = FileAttachment("assets/crps_learning/weights_plot/cdf_data.csv").csv({ typed: true }) ``` ```{ojs} function updateChartInner(g, x, y, linesGroup, color, line, data) { // Update axes with transitions x.domain(d3.extent(data, d => d.x)); g.select(".x-axis").transition().duration(1500).call(d3.axisBottom(x).ticks(10)); y.domain([0, d3.max(data, d => d.y)]); g.select(".y-axis").transition().duration(1500).call(d3.axisLeft(y).ticks(5)); // Group data by basis function const dataByFunction = Array.from(d3.group(data, d => d.b)); const keyFn = d => d[0]; // Update basis function lines const u = linesGroup.selectAll("path").data(dataByFunction, keyFn); u.join( enter => enter.append("path").attr("fill","none").attr("stroke-width",3) .attr("stroke", (_, i) => color(i)).attr("d", d => line(d[1].map(pt => ({x: pt.x, y: 0})))) .style("opacity",0), update => update, exit => exit.transition().duration(1000).style("opacity",0).remove() ) .transition().duration(1000) .attr("d", d => line(d[1])) .attr("stroke", (_, i) => color(i)) .style("opacity",1); } chart = { // State variable for selected mu parameter let selectedMu = 1; const filteredData = () => cdf_data.filter(d => Math.abs(selectedMu - d.mu) < 0.001 ); const container = d3.create("div") .style("max-width", "none") .style("width", "100%"); const controlsContainer = container.append("div") .style("display", "flex") .style("gap", "20px") .style("align-items", "center"); // Single slider control for mu const sliderContainer = controlsContainer.append('div') .style('display','flex') .style('align-items','center') .style('gap','10px') .style('flex','1'); sliderContainer.append('label') .text('Naive:') .style('font-size','20px'); const muSlider = sliderContainer.append('input') .attr('type','range') .attr('min', 0) .attr('max', 1) .attr('step', 0.1) .property('value', selectedMu) .on('input', function(event) { selectedMu = +this.value; muDisplay.text(selectedMu.toFixed(1)); updateChart(filteredData()); }) .style('width', '100%') //.style('-webkit-appearance', 'none') .style('appearance', 'none') .style('height', '8px') .style('background', '#BDBDBDFF'); const muDisplay = sliderContainer.append('span') .text(selectedMu.toFixed(1)) .style('font-size','20px'); // Add Reset button controlsContainer.append('button') .text('Reset') .style('font-size', '20px') .style('align-self', 'center') .style('margin-left', 'auto') .on('click', () => { selectedMu = 1; muSlider.property('value', selectedMu); muDisplay.text(selectedMu.toFixed(1)); updateChart(filteredData()); }); // Build SVG const width = 600; const height = 450; const margin = {top: 40, right: 20, bottom: 40, left: 40}; const innerWidth = width - margin.left - margin.right; const innerHeight = height - margin.top - margin.bottom; // Set controls container width to match SVG plot width controlsContainer.style("max-width", "none").style("width", "100%"); // Distribute each control evenly and make sliders full-width controlsContainer.selectAll("div").style("flex", "1").style("min-width", "0px"); controlsContainer.selectAll("input").style("width", "100%").style("box-sizing", "border-box"); // Create scales const x = d3.scaleLinear() .range([0, innerWidth]); const y = d3.scaleLinear() .range([innerHeight, 0]); // Create a color scale for the basis functions const color = d3.scaleOrdinal(["#80C684FF", "#FFD44EFF", "#D81A5FFF"]); // Create SVG const svg = d3.create("svg") .attr("width", "100%") .attr("height", "auto") .attr("viewBox", [0, 0, width, height]) .attr("preserveAspectRatio", "xMidYMid meet") .attr("style", "max-width: 100%; height: auto;"); // Create the chart group const g = svg.append("g") .attr("transform", `translate(${margin.left},${margin.top})`); // Add axes const xAxis = g.append("g") .attr("transform", `translate(0,${innerHeight})`) .attr("class", "x-axis") .call(d3.axisBottom(x).ticks(10)) .style("font-size", "20px"); const yAxis = g.append("g") .attr("class", "y-axis") .call(d3.axisLeft(y).ticks(5)) .style("font-size", "20px"); // Add a horizontal line at y = 0 g.append("line") .attr("x1", 0) .attr("x2", innerWidth) .attr("y1", y(0)) .attr("y2", y(0)) .attr("stroke", "#000") .attr("stroke-opacity", 0.2); // Add gridlines g.append("g") .attr("class", "grid-lines") .selectAll("line") .data(y.ticks(5)) .join("line") .attr("x1", 0) .attr("x2", innerWidth) .attr("y1", d => y(d)) .attr("y2", d => y(d)) .attr("stroke", "#ccc") .attr("stroke-opacity", 0.5); // Create a line generator const line = d3.line() .x(d => x(d.x)) .y(d => y(d.y)) .curve(d3.curveBasis); // Group to contain the basis function lines const linesGroup = g.append("g") .attr("class", "basis-functions"); // Store the current basis functions for transition let currentBasisFunctions = new Map(); // Function to update the chart with new data function updateChart(data) { updateChartInner(g, x, y, linesGroup, color, line, data); } // Store the update function svg.node().update = updateChart; // Initial render updateChart(filteredData()); container.node().appendChild(svg.node()); return container.node(); } ``` ::: ::: :::: ## The Framework of Prediction under Expert Advice ### :::: {.columns} ::: {.column width="48%"} Each day, $t = 1, 2, ... T$ - The **forecaster** receives predictions $\widehat{X}_{t,k}$ from $K$ **experts** - The **forecaster** assings weights $w_{t,k}$ to each **expert** - The **forecaster** calculates her prediction: \begin{equation} \widetilde{X}_{t} = \sum_{k=1}^K w_{t,k} \widehat{X}_{t,k}. \label{eq_forecast_def} \end{equation} - The realization for $t$ is observed ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} - The experts can be institutions, persons, or models - The forecasts can be point-forecasts (i.e., mean or median) or full predictive distributions - We do not need a distributional assumption concerning the underlying data - @cesa2006prediction ::: :::: --- ## The Regret Weights are updated sequentially according to the past performance of the $K$ experts. That is, a loss function $\ell$ is needed. This is used to compute the **cumulative regret** $R_{t,k}$ \begin{equation} R_{t,k} = \widetilde{L}_{t} - \widehat{L}_{t,k} = \sum_{i = 1}^t \ell(\widetilde{X}_{i},Y_i) - \ell(\widehat{X}_{i,k},Y_i)\label{eq:regret} \end{equation} The cumulative regret: - Indicates the predictive accuracy of the expert $k$ until time $t$. - Measures how much the forecaster *regrets* not having followed the expert's advice Popular loss functions for point forecasting @gneiting2011making: :::: {.columns} ::: {.column width="48%"} $\ell_2$ loss: \begin{equation} \ell_2(x, y) = | x -y|^2 \label{eq:elltwo} \end{equation} Strictly proper for *mean* prediction ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} $\ell_1$ loss: \begin{equation} \ell_1(x, y) = | x -y| \label{eq:ellone} \end{equation} Strictly proper for *median* predictions ::: :::: ## Popular Algorithms and the Risk
:::: {.columns} ::: {.column width="48%"} ### Popular Aggregation Algorithms
#### The naive combination \begin{equation} w_{t,k}^{\text{Naive}} = \frac{1}{K}\label{eq:naive_combination} \end{equation} #### The exponentially weighted average forecaster (EWA) \begin{equation} \begin{aligned} w_{t,k}^{\text{EWA}} & = \frac{e^{\eta R_{t,k}} }{\sum_{k = 1}^K e^{\eta R_{t,k}}}\\ & = \frac{e^{-\eta \ell(\widehat{X}_{t,k},Y_t)} w^{\text{EWA}}_{t-1,k} }{\sum_{k = 1}^K e^{-\eta \ell(\widehat{X}_{t,k},Y_t)} w^{\text{EWA}}_{t-1,k}} \end{aligned}\label{eq:exp_combination} \end{equation} ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} ### Optimality In stochastic settings, the cumulative Risk should be analyzed @wintenberger2017optimal: \begin{align} &\underbrace{\widetilde{\mathcal{R}}_t = \sum_{i=1}^t \mathbb{E}[\ell(\widetilde{X}_{i},Y_i)|\mathcal{F}_{i-1}]}_{\text{Cumulative Risk of Forecaster}} \\ &\underbrace{\widehat{\mathcal{R}}_{t,k} = \sum_{i=1}^t \mathbb{E}[\ell(\widehat{X}_{i,k},Y_i)|\mathcal{F}_{i-1}]}_{\text{Cumulative Risk of Experts}} \label{eq_def_cumrisk} \end{align} ::: :::: ## Optimal Convergence
:::: {.columns} ::: {.column width="48%"} ### The selection problem \begin{equation} \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\min} \right) \stackrel{t\to \infty}{\rightarrow} a \quad \text{with} \quad a \leq 0. \label{eq_opt_select} \end{equation} The forecaster is asymptotically not worse than the best expert. ### The convex aggregation problem \begin{equation} \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\pi} \right) \stackrel{t\to \infty}{\rightarrow} b \quad \text{with} \quad b \leq 0 . \label{eq_opt_conv} \end{equation} The forecaster is asymptotically not worse than the best convex combination in hindsight (**oracle**). ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} Optimal rates with respect to selection \eqref{eq_opt_select} and convex aggregation \eqref{eq_opt_conv} @wintenberger2017optimal: \begin{align} \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\min} \right) & = \mathcal{O}\left(\frac{\log(K)}{t}\right)\label{eq_optp_select} \end{align} \begin{align} \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\pi} \right) & = \mathcal{O}\left(\sqrt{\frac{\log(K)}{t}}\right) \label{eq_optp_conv} \end{align} Algorithms can statisfy both \eqref{eq_optp_select} and \eqref{eq_optp_conv} depending on: - The loss function - Regularity conditions on $Y_t$ and $\widehat{X}_{t,k}$ - The weighting scheme ::: :::: ## :::: {.columns} ::: {.column width="48%"} ### Optimal Convergence
EWA satisfies optimal selection convergence \eqref{eq_optp_select} in a deterministic setting if: - Loss $\ell$ is exp-concave - Learning-rate $\eta$ is chosen correctly Those results can be converted to any stochastic setting @wintenberger2017optimal. Optimal convex aggregation convergence \eqref{eq_optp_conv} can be satisfied by applying the kernel-trick: \begin{align} \ell^{\nabla}(x,y) = \ell'(\widetilde{X},y) x \end{align} $\ell'$ is the subgradient of $\ell$ at forecast combination $\widetilde{X}$. ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} ### Probabilistic Setting
**An appropriate choice:** \begin{equation*} \text{CRPS}(F, y) = \int_{\mathbb{R}} {(F(x) - \mathbb{1}\{ x > y \})}^2 dx \label{eq:crps} \end{equation*} It's strictly proper [@gneiting2007strictly]. Using the CRPS, we can calculate time-adaptive weights $w_{t,k}$. However, what if the experts' performance varies in parts of the distribution? Utilize this relation: \begin{equation*} \text{CRPS}(F, y) = 2 \int_0^{1} \text{QL}_p(F^{-1}(p), y) dp.\label{eq_crps_qs} \end{equation*} ... to combine quantiles of the probabilistic forecasts individually using the quantile-loss QL. ::: :::: ## CRPS Learning Optimality ::: {.panel-tabset} ## Almost Optimal Convergence :::: {style="font-size: 90%;"} QL is convex, but not exp-concave Bernstein Online Aggregation (BOA) lets us weaken the exp-concavity condition. It satisfies that there exist a $C>0$ such that for $x>0$ it holds that \begin{equation} P\left( \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\pi} \right) \leq C \log(\log(t)) \left(\sqrt{\frac{\log(K)}{t}} + \frac{\log(K)+x}{t}\right) \right) \geq 1-e^{-x} \label{eq_boa_opt_conv} \end{equation} Almost optimal w.r.t. *convex aggregation* \eqref{eq_optp_conv} @wintenberger2017optimal. The same algorithm satisfies that there exist a $C>0$ such that for $x>0$ it holds that \begin{equation} P\left( \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\min} \right) \leq C\left(\frac{\log(K)+\log(\log(Gt))+ x}{\alpha t}\right)^{\frac{1}{2-\beta}} \right) \geq 1-2e^{-x} \label{eq_boa_opt_select} \end{equation} if $Y_t$ is bounded, the considered loss $\ell$ is convex, $G$-Lipschitz, and weak exp-concave in its first coordinate. Almost optimal w.r.t. *selection* \eqref{eq_optp_select} @gaillard2018efficient. We show that this holds for QL under feasible conditions. ::: ## Conditions + Lemma :::: {.columns} ::: {.column width="48%"} **Lemma 1** \begin{align} 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\min} & \leq \widehat{\mathcal{R}}^{\text{CRPS}}_{t,\min} \label{eq_risk_ql_crps_expert} \\ 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\pi} & \leq \widehat{\mathcal{R}}^{\text{CRPS}}_{t,\pi} . \label{eq_risk_ql_crps_convex} \end{align} Pointwise can outperform constant procedures QL is convex but not exp-concave: Almost optimal convergence w.r.t. *convex aggregation* \eqref{eq_boa_opt_conv}
For almost optimal congerence w.r.t. *selection* \eqref{eq_boa_opt_select} we need to check **A1** and **A2**: QL is Lipschitz continuous: **A1** holds ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} **A1** For some $G>0$ it holds for all $x_1,x_2\in \mathbb{R}$ and $t>0$ that $$ | \ell(x_1, Y_t)-\ell(x_2, Y_t) | \leq G |x_1-x_2|$$ **A2** For some $\alpha>0$, $\beta\in[0,1]$ it holds for all $x_1,x_2 \in \mathbb{R}$ and $t>0$ that \begin{align*} \mathbb{E}[ & \ell(x_1, Y_t)-\ell(x_2, Y_t) | \mathcal{F}_{t-1}] \leq \\ & \mathbb{E}[ \ell'(x_1, Y_t)(x_1 - x_2) |\mathcal{F}_{t-1}] \\ & + \mathbb{E}\left[ \left. \left( \alpha(\ell'(x_1, Y_t)(x_1 - x_2))^{2}\right)^{1/\beta} \right|\mathcal{F}_{t-1}\right] \end{align*} Almost optimal w.r.t. *selection* \eqref{eq_optp_select} @gaillard2018efficient. ::: :::: ## Proposition + Theorem :::: {.columns} ::: {.column width="48%"} Conditional quantile risk: $\mathcal{Q}_p(x) = \mathbb{E}[ \text{QL}_p(x, Y_t) | \mathcal{F}_{t-1}]$. convexity properties of $\mathcal{Q}_p$ depend on the conditional distribution $Y_t|\mathcal{F}_{t-1}$. **Proposition 1** Let $Y$ be a univariate random variable with (Radon-Nikodym) $\nu$-density $f$, then for the second subderivative of the quantile risk $\mathcal{Q}_p(x) = \mathbb{E}[ \text{QL}_p(x, Y) ]$ of $Y$ it holds for all $p\in(0,1)$ that $\mathcal{Q}_p'' = f.$ Additionally, if $f$ is a continuous Lebesgue-density with $f\geq\gamma>0$ for some constant $\gamma>0$ on its support $\text{spt}(f)$ then is $\mathcal{Q}_p$ is $\gamma$-strongly convex. Strong convexity with $\beta=1$ implies weak exp-concavity **A2** @gaillard2018efficient ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} **A1** and **A2** give us almost optimal convergence w.r.t. selection \eqref{eq_boa_opt_select}
**Theorem 1** The gradient based fully adaptive Bernstein online aggregation (BOAG) applied pointwise for all $p\in(0,1)$ on $\text{QL}$ satisfies \eqref{eq_boa_opt_conv} with minimal CRPS given by $$\widehat{\mathcal{R}}_{t,\pi} = 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\pi}.$$ If $Y_t|\mathcal{F}_{t-1}$ is bounded and has a pdf $f_t$ satifying $f_t>\gamma >0$ on its support $\text{spt}(f_t)$ then \ref{eq_boa_opt_select} holds with $\beta=1$ and $$\widehat{\mathcal{R}}_{t,\min} = 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\min}$$. ::: :::: :::: :::: {.notes} We apply Bernstein Online Aggregation (BOA). It lets us weaken the exp-concavity condition while almost keeping the optimalities \ref{eq_optp_select} and \ref{eq_optp_conv}. :::: ## A Probabilistic Example :::: {.columns} ::: {.column width="48%"} Simple Example: \begin{align} Y_t & \sim \mathcal{N}(0,\,1) \\ \widehat{X}_{t,1} & \sim \widehat{F}_{1} = \mathcal{N}(-1,\,1) \\ \widehat{X}_{t,2} & \sim \widehat{F}_{2} = \mathcal{N}(3,\,4) \label{eq:dgp_sim1} \end{align} - True weights vary over $p$ - Figures show the ECDF and calculated weights using $T=25$ realizations - Pointwise solution creates rough estimates - Pointwise is better than constant - Smooth solution is better than pointwise ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} ::: {.panel-tabset} ## CDFs ```{r, echo = FALSE, fig.width=7, fig.height=6, fig.align='center', cache = TRUE} source("assets/01_common.R") load("assets/crps_learning/01_motivation_01.RData") ggplot(df, aes(x = x, y = y, xend = xend, yend = yend)) + stat_function( fun = pnorm, n = 10000, args = list(mean = dev[2], sd = experts_sd[2]), aes(col = "Expert 2"), size = 1.5 ) + stat_function( fun = pnorm, n = 10000, args = list(mean = dev[1], sd = experts_sd[1]), aes(col = "Expert 1"), size = 1.5 ) + stat_function( fun = pnorm, n = 10000, size = 1.5, aes(col = "DGP") # , linetype = "dashed" ) + geom_point(aes(col = "ECDF"), size = 1.5, show.legend = FALSE) + geom_segment(aes(col = "ECDF")) + geom_segment(data = tibble( x_ = -5, xend_ = min(y), y_ = 0, yend_ = 0 ), aes(x = x_, xend = xend_, y = y_, yend = yend_)) + theme_minimal() + theme( text = element_text(size = text_size), legend.position = "bottom", legend.key.width = unit(1.5, "cm") ) + ylab("Probability p") + xlab("Value") + scale_colour_manual(NULL, values = c("#969696", "#252525", col_auto, col_blue)) + guides(color = guide_legend( nrow = 2, byrow = FALSE # , # override.aes = list( # size = c(1.5, 1.5, 1.5, 1.5) # ) )) + scale_x_continuous(limits = c(-5, 7.5)) ``` ## Weights ```{r, echo = FALSE, fig.width=7, fig.height=6, fig.align='center', cache = TRUE} source("assets/01_common.R") load("assets/crps_learning/01_motivation_02.RData") ggplot() + geom_line(data = weights[weights$var != "1Optimum", ], size = 1.5, aes(x = prob, y = val, col = var)) + geom_line( data = weights[weights$var == "1Optimum", ], size = 1.5, aes(x = prob, y = val, col = var) # , linetype = "dashed" ) + theme_minimal() + theme( text = element_text(size = text_size), legend.position = "bottom", legend.key.width = unit(1.5, "cm") ) + xlab("Probability p") + ylab("Weight w") + scale_colour_manual( NULL, values = c("#969696", col_pointwise, col_p_constant, col_p_smooth), labels = modnames[-c(3, 5)] ) + guides(color = guide_legend( ncol = 3, byrow = FALSE, title.hjust = 5, # override.aes = list( # linetype = c(rep("solid", 5), "dashed") # ) )) + ylim(c(0, 1)) ``` :::: ::: ::: ## The Smoothing Procedures ::: {.panel-tabset} ## Penalized Smoothing :::: {.columns} ::: {.column width="48%"} Penalized cubic B-Splines for smoothing weights: Let $\varphi=(\varphi_1,\ldots, \varphi_L)$ be bounded basis functions on $(0,1)$ Then we approximate $w_{t,k}$ by \begin{align} w_{t,k}^{\text{smooth}} = \sum_{l=1}^L \beta_l \varphi_l = \beta'\varphi \end{align} with parameter vector $\beta$. The latter is estimated to penalize $L_2$-smoothing which minimizes \begin{equation} \| w_{t,k} - \beta' \varphi \|^2_2 + \lambda \| \mathcal{D}^{d} (\beta' \varphi) \|^2_2 \label{eq_function_smooth} \end{equation} with differential operator $\mathcal{D}$ Computation is easy, since we have an analytical solution ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} We receive the constant solution for high values of $\lambda$ when setting $d=1$

::: :::: ## Basis Smoothing :::: {.columns} ::: {.column width="48%"} Represent weights as linear combinations of bounded basis functions: \begin{equation} w_{t,k} = \sum_{l=1}^L \beta_{t,k,l} \varphi_l = \boldsymbol \beta_{t,k}' \boldsymbol \varphi \end{equation} A popular choice are are B-Splines as local basis functions $\boldsymbol \beta_{t,k}$ is calculated using a reduced regret matrix: \begin{equation} \underbrace{\boldsymbol r_{t}}_{\text{LxK}} = \frac{L}{P} \underbrace{\boldsymbol B'}_{\text{LxP}} \underbrace{\left({\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widetilde{\boldsymbol X}_{t},Y_t)- {\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widehat{\boldsymbol X}_{t},Y_t)\right)}_{\text{PxK}} \end{equation} $\boldsymbol r_{t}$ is transformed from PxK to LxK If $L = P$ it holds that $\boldsymbol \varphi = \boldsymbol{I}$ For $L = 1$ we receive constant weights ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} Weights converge to the constant solution if $L\rightarrow 1$

::: :::: :::: --- ## The Proposed CRPS-Learning Algorithm
::: {style="font-size: 85%;"} :::: {.columns} ::: {.column width="43%"} ### Initialization: Array of expert predicitons: $\widehat{X}_{t,p,k}$ Vector of Prediction targets: $Y_t$ Starting Weights: $\boldsymbol w_0=(w_{0,1},\ldots, w_{0,K})$ Penalization parameter: $\lambda\geq 0$ B-spline and penalty matrices $\boldsymbol B$ and $\boldsymbol D$ on $\mathcal{P}= (p_1,\ldots,p_M)$ Hat matrix: $$\boldsymbol{\mathcal{H}} = \boldsymbol B(\boldsymbol B'\boldsymbol B+ \lambda (\alpha \boldsymbol D_1'\boldsymbol D_1 + (1-\alpha) \boldsymbol D_2'\boldsymbol D_2))^{-1} \boldsymbol B'$$ Cumulative Regret: $R_{0,k} = 0$ Range parameter: $E_{0,k}=0$ Starting pseudo-weights: $\boldsymbol \beta_0 = \boldsymbol B^{\text{pinv}}\boldsymbol w_0(\boldsymbol{\mathcal{P}})$ ::: ::: {.column width="2%"} ::: ::: {.column width="55%"} ### Core: for( t in 1:T ) { $\widetilde{\boldsymbol X}_{t} = \text{Sort}\left( \boldsymbol w_{t-1}'(\boldsymbol P) \widehat{\boldsymbol X}_{t} \right)$ # Prediction $\boldsymbol r_{t} = \frac{L}{M} \boldsymbol B' \left({\boldsymbol{QL}}_{\boldsymbol{\mathcal P}}^{\nabla}(\widetilde{\boldsymbol X}_{t},Y_t)- {\boldsymbol{QL}}_{\boldsymbol{\mathcal P}}^{\nabla}(\widehat{\boldsymbol X}_{t},Y_t)\right)$ $\boldsymbol E_{t} = \max(\boldsymbol E_{t-1}, \boldsymbol r_{t}^+ + \boldsymbol r_{t}^-)$ $\boldsymbol V_{t} = \boldsymbol V_{t-1} + \boldsymbol r_{t}^{ \odot 2}$ $\boldsymbol \eta_{t} =\min\left( \left(-\log(\boldsymbol \beta_{0}) \odot \boldsymbol V_{t}^{\odot -1} \right)^{\odot\frac{1}{2}} , \frac{1}{2}\boldsymbol E_{t}^{\odot-1}\right)$ $\boldsymbol R_{t} = \boldsymbol R_{t-1}+ \boldsymbol r_{t} \odot \left( \boldsymbol 1 - \boldsymbol \eta_{t} \odot \boldsymbol r_{t} \right)/2 + \boldsymbol E_{t} \odot \mathbb{1}\{-2\boldsymbol \eta_{t}\odot \boldsymbol r_{t} > 1\}$ $\boldsymbol \beta_{t} = K \boldsymbol \beta_{0} \odot \boldsymbol {SoftMax}\left( - \boldsymbol \eta_{t} \odot \boldsymbol R_{t} + \log( \boldsymbol \eta_{t}) \right)$ $\boldsymbol w_{t}(\boldsymbol P) = \underbrace{\boldsymbol B(\boldsymbol B'\boldsymbol B+ \lambda (\alpha \boldsymbol D_1'\boldsymbol D_1 + (1-\alpha) \boldsymbol D_2'\boldsymbol D_2))^{-1} \boldsymbol B'}_{\boldsymbol{\mathcal{H}}} \boldsymbol B \boldsymbol \beta_{t}$ } ::: :::: ::: ## Simulation Study ::: {.panel-tabset} ## BOA :::: {.columns} ::: {.column width="48%"} Data Generating Process of the [simple probabilistic example](#simple_example): \begin{align*} Y_t &\sim \mathcal{N}(0,\,1)\\ \widehat{X}_{t,1} &\sim \widehat{F}_{1}=\mathcal{N}(-1,\,1) \\ \widehat{X}_{t,2} &\sim \widehat{F}_{2}=\mathcal{N}(3,\,4) \end{align*} - Constant solution $\lambda \rightarrow \infty$ - Pointwise Solution of the proposed BOAG - Smoothed Solution of the proposed BOAG - Weights are smoothed during learning - Smooth weights are used to calculate Regret, adjust weights, etc. ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} ::: {.panel-tabset} ## QL Deviation Deviation from best attainable QL (1000 runs). ![](assets/crps_learning/pre_vs_post.gif) ## CRPS vs. Lambda CRPS Values for different $\lambda$ (1000 runs) ![](assets/crps_learning/pre_vs_post_lambda.gif) ## Knots CRPS for different number of knots (1000 runs) ![](assets/crps_learning/pre_vs_post_kstep.gif) :::: ::: :::: ## Comparison to EWA and ML-Poly The same simulation carried out for different algorithms (1000 runs):

## Study Forget :::: {.columns} ::: {.column width="38%"} #### New DGP: \begin{align*} Y_t &\sim \mathcal{N}\left(\frac{\sin(0.005 \pi t )}{2},\,1\right) \\ \widehat{X}_{t,1} &\sim \widehat{F}_{1} = \mathcal{N}(-1,\,1) \\ \widehat{X}_{t,2} &\sim \widehat{F}_{2} = \mathcal{N}(3,\,4) \end{align*} Changing optimal weights Single run example depicted aside No forgetting leads to long-term constant weights

::: ::: {.column width="4%"} ::: ::: {.column width="58%"} ### ```{r, echo = FALSE, fig.width=7, fig.height=5, fig.align='center', cache = TRUE} load("assets/crps_learning/weights_preprocessed.rda") mod_labs <- c("Optimum", "No Forget\nPointwise", "No Forget\nP-Smooth", "Forget\nPointwise", "Forget\nP-Smooth") names(mod_labs) <- c("Optimum", "nf_ptw", "nf_psmth", "f_ptw", "f_psmth") colseq <- c(grey(.99), "orange", "red", "purple", "blue", "darkblue", "black") weights_preprocessed %>% mutate(w = 1 - w) %>% ggplot(aes(t, p, fill = w)) + geom_raster(interpolate = TRUE) + facet_grid(Mod ~ ., labeller = labeller(Mod = mod_labs)) + theme_minimal() + theme( # plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm"), text = element_text(size = 15), legend.key.height = unit(1, "inch") ) + scale_x_continuous(expand = c(0, 0)) + xlab("Time t") + scale_fill_gradientn( limits = c(0, 1), colours = colseq, breaks = seq(0, 1, 0.2) ) + ylab("Weight w") ``` ::: :::: :::: ## Application Study ::: {.panel-tabset} ## Overview :::: {.columns} ::: {.column width="29%"} ::: {style="font-size: 85%;"} Data: - Forecasting European emission allowances (EUA) - Daily month-ahead prices - Jan 13 - Dec 20 (Phase III, 2092 Obs) - Rolling Window (length 250 ~ 1 year) Combination methods: - Online - Naive, BOAG, EWAG, ML-PolyG, BMA - Batch - QRlin, QRconv :::: ::: ::: {.column width="2%"} ::: ::: {.column width="69%"} Tuning paramter grids: - Smoothing Penalty: $\Lambda= \{0\}\cup \{2^x|x\in \{-4,-3.5,\ldots,12\}\}$ - Learning Rates: $\mathcal{E}= \{2^x|x\in \{-1,-0.5,\ldots,9\}\}$ ```{r, echo = FALSE, fig.width=10, fig.height=5, fig.align='center', cache = TRUE} load("assets/crps_learning/overview_data.rds") data %>% ggplot(aes(x = Date, y = value)) + geom_line(size = 1, col = cols[9,"blue"]) + theme_minimal() + ylab("Value") + facet_wrap(. ~ name, scales = "free", ncol = 1) + theme( text = element_text(size = 15), strip.background = element_blank(), strip.text.x = element_blank() ) -> p1 data %>% ggplot(aes(x = value)) + geom_histogram(aes(y = ..density..), size = 1, fill = cols[9,"blue"], bins = 50) + ylab("Density") + xlab("Value") + theme_minimal() + theme( strip.background = element_rect(fill = cols[3,"grey"], colour = cols[3,"grey"]), text = element_text(size = text_size) ) + facet_wrap(. ~ name, scales = "free", ncol = 1, strip.position = "right") -> p2 overview <- cowplot::plot_grid(plotlist = list(p1, p2), align = "hv", axis = "tblr", rel_widths = c(0.65, 0.35)) overview ``` ::: :::: ## Experts ::: {style="font-size: 90%;"} Simple exponential smoothing with additive errors (**ETS-ANN**): \begin{align*} Y_{t} = l_{t-1} + \varepsilon_t \quad \text{with} \quad l_t = l_{t-1} + \alpha \varepsilon_t \quad \text{and} \quad \varepsilon_t \sim \mathcal{N}(0,\sigma^2) \end{align*} Quantile regression (**QuantReg**): For each $p \in \mathcal{P}$ we assume: \begin{align*} F^{-1}_{Y_t}(p) = \beta_{p,0} + \beta_{p,1} Y_{t-1} + \beta_{p,2} |Y_{t-1}-Y_{t-2}| \end{align*} ARIMA(1,0,1)-GARCH(1,1) with Gaussian errors (**ARMA-GARCH**): \begin{align*} Y_{t} = \mu + \phi(Y_{t-1}-\mu) + \theta \varepsilon_{t-1} + \varepsilon_t \quad \text{with} \quad \varepsilon_t = \sigma_t Z, \quad \sigma_t^2 = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2 \quad \text{and} \quad Z_t \sim \mathcal{N}(0,1) \end{align*} ARIMA(0,1,0)-I-EGARCH(1,1) with Gaussian errors (**I-EGARCH**): \begin{align*} Y_{t} = \mu + Y_{t-1} + \varepsilon_t \quad \text{with} \quad \varepsilon_t = \sigma_t Z, \quad \log(\sigma_t^2) = \omega + \alpha Z_{t-1}+ \gamma (|Z_{t-1}|-\mathbb{E}|Z_{t-1}|) + \beta \log(\sigma_{t-1}^2) \quad \text{and} \quad Z_t \sim \mathcal{N}(0,1) \end{align*} ARIMA(0,1,0)-GARCH(1,1) with student-t errors (**I-GARCHt**): \begin{align*} Y_{t} = \mu + Y_{t-1} + \varepsilon_t \quad \text{with} \quad \varepsilon_t = \sigma_t Z, \quad \sigma_t^2 = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2 \quad \text{and} \quad Z_t \sim t(0,1, \nu) \end{align*} :::: ## Results ::: {.panel-tabset} ## Significance ```{r, echo = FALSE, fig.width=7, fig.height=5.5, fig.align='center', cache = TRUE, results='asis'} load("assets/crps_learning/bernstein_application_study_estimations+learnings_rev1.RData") quantile_loss <- function(X, y, tau) { t(t(y - X) * tau) * (y - X > 0) + t(t(X - y) * (1 - tau)) * (y - X < 0) } QL <- FCSTN * NA for (k in 1:dim(QL)[1]) { QL[k, , ] <- quantile_loss(FCSTN[k, , ], as.numeric(yoos), Qgrid) } ## TABLE AREA KK <- length(mnames) TTinit <- 1 ## without first, as all comb. are uniform RQL <- apply(QL[1:KK, -c(1:TTinit), ], c(1, 3), mean) dimnames(RQL) <- list(mnames, Qgrid) RQLm <- apply(RQL, c(1), mean, na.rm = TRUE) ## qq <- apply(QL[1:KK, -c(1:TTinit), ], c(1, 2), mean) library(xtable) Pall <- numeric(KK) for (i in 1:KK) Pall[i] <- t.test(qq[K + 1, ] - qq[i, ], alternative = "greater")$p.val Mall <- (RQLm - RQLm[K + 1]) * 10000 Mout <- matrix(Mall[-c(1:(K + 3))], 5, 6) dimnames(Mout) <- list(moname, mtname) Pallout <- format(round(Pall, 3), nsmall = 3) Pallout[Pallout == "0.000"] <- "<.001" Pallout[Pallout == "1.000"] <- ">.999" MO <- K IDX <- c(1:K) OUT <- t(Mall[IDX]) OUT.num <- OUT class(OUT.num) <- "numeric" xxx <- OUT.num xxxx <- OUT table <- round(OUT, 3) table_col <- OUT i.p <- 1 for (i.p in 1:MO) { xmax <- -min(Mall) * 5 # max(Mall) xmin <- min(Mall) cred <- rev(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .8, .5)) # , .5,0,0,0,1,1,1) ## red cgreen <- rev(c(.5, .5, .55, .6, .65, .7, .75, .8, .85, .9, .95, 1, 1, .9)) # , .5,0,1,1,1,0,0) ## green cblue <- rev(c(.55, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5)) # , .5,1,1,0,0,0,1) ## blue crange <- c(xmin, xmax) ## range ## colors in plot: fred <- round(approxfun(seq(crange[1], crange[2], length = length(cred)), cred)(pmin(xxx[, i.p], xmax)), 3) fgreen <- round(approxfun(seq(crange[1], crange[2], length = length(cgreen)), cgreen)(pmin(xxx[, i.p], xmax)), 3) fblue <- round(approxfun(seq(crange[1], crange[2], length = length(cblue)), cblue)(pmin(xxx[, i.p], xmax)), 3) tmp <- format(round(xxx[, i.p], 3), nsmall = 3) xxxx[, i.p] <- paste("\\cellcolor[rgb]{", fred, ",", fgreen, ",", fblue, "}", tmp, " {\\footnotesize (", Pallout[IDX[i.p]], ")}", sep = "") table_col[, i.p] <- rgb(fred, fgreen, fblue, maxColorValue = 1) table[, i.p] <- paste0( table[, i.p], '^{(', Pallout[i.p], ")}" ) } table_out <- kbl( table, align = rep("c", ncol(table)), bootstrap_options = c("condensed"), escape = FALSE) %>% kable_paper(full_width = TRUE) %>% row_spec(0:nrow(table), color = cols[9, "grey"]) for (j in 1:ncol(table)) { table_out <- table_out %>% column_spec(j, background = table_col[, j]) } table_out ``` ```{r, echo = FALSE, fig.width=7, fig.height=5.5, fig.align='center', cache = TRUE, results='asis'} MO <- 6 OUT <- Mout OUT.num <- OUT class(OUT.num) <- "numeric" xxx <- OUT.num xxxx <- OUT i.p <- 1 table2 <- OUT table_col2 <- OUT for (i.p in 1:MO) { xmax <- -min(Mall) * 5 # max(Mall) xmin <- min(Mall) cred <- rev(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .8, .5)) # , .5,0,0,0,1,1,1) ## red cgreen <- rev(c(.5, .5, .55, .6, .65, .7, .75, .8, .85, .9, .95, 1, 1, .9)) # , .5,0,1,1,1,0,0) ## green cblue <- rev(c(.55, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5)) # , .5,1,1,0,0,0,1) ## blue crange <- c(xmin, xmax) ## range ## colors in plot: fred <- round(approxfun(seq(crange[1], crange[2], length = length(cred)), cred)(pmin(xxx[, i.p], xmax)), 3) fgreen <- round(approxfun(seq(crange[1], crange[2], length = length(cgreen)), cgreen)(pmin(xxx[, i.p], xmax)), 3) fblue <- round(approxfun(seq(crange[1], crange[2], length = length(cblue)), cblue)(pmin(xxx[, i.p], xmax)), 3) tmp <- format(round(xxx[, i.p], 3), nsmall = 3) xxxx[, i.p] <- paste("\\cellcolor[rgb]{", fred, ",", fgreen, ",", fblue, "}", tmp, " {\\footnotesize (", Pallout[K + 3 + 5 * (i.p - 1) + 1:5], ")}", sep = "") # table2[, i.p] <- paste0(tmp, " (", Pallout[K + 3 + 5 * (i.p - 1) + 1:5], ")") table2[, i.p] <- paste0( tmp, "^{(", Pallout[K + 3 + 5 * (i.p - 1) + 1:5], ")}" ) table_col2[, i.p] <- rgb(fred, fgreen, fblue, maxColorValue = 1) } # i.p rownames(table2) <- c("Pointwise", "B-Smooth", "P-Smooth", "B-Constant", "P-Constant") rownames(table_col2) <- rownames(table2) table2["B-Smooth", c("QRlin", "QRconv")] <- "-" table_col2["B-Smooth", c("QRlin", "QRconv")] <- cols[2, "grey"] idx <- c("Pointwise", "B-Constant", "P-Constant", "B-Smooth", "P-Smooth") table2["P-Smooth", "BOAG"] <- "-0.182^(0.039)" table_out2 <- kableExtra::kbl( table2[idx, ], align = rep("c", ncol(table2)), bootstrap_options = c("condensed"), escape = FALSE ) %>% kable_paper(full_width = TRUE) %>% row_spec(0:nrow(table2[idx, ]), color = cols[9, "grey"]) for (j in seq_len(ncol(table2[idx, ]))) { table_out2 <- table_out2 %>% column_spec(1 + j, background = table_col2[idx, j] ) } table_out2 %>% column_spec(1, bold = T) ```

CRPS difference to Naive. Negative values correspond to better performance (the best value is bold).
Additionally, we show the p-value of the DM-test, testing against Naive. The cells are colored with respect to their values (the greener better).

## QL ```{r, echo = FALSE, fig.width=13, fig.height=5.5, fig.align='center', cache = TRUE} ##### Performance across probabilities M <- length(mnames) Msel <- c(1:K, K + 1, K + 1 + 2 + 1:4 * 5 - 2) ## experts + naive + smooth modnames <- mnames[Msel] tCOL <- c( "#E6CC00", "#CC6600", "#E61A1A", "#99004D", "#F233BF", "#666666", "#0000CC", "#1A80E6", "#1AE680", "#00CC00" ) t(RQL) %>% as_tibble() %>% select(Naive) %>% mutate(Naive = 0) %>% mutate(p = 1:99 / 100) %>% pivot_longer(-p, values_to = "Loss differences") -> dummy t(RQL) %>% as_tibble() %>% select(mnames[Msel]) %>% mutate(p = 1:99 / 100) %>% pivot_longer(!p & !Naive) %>% mutate(`Loss differences` = value - Naive) %>% select(-value, -Naive) %>% rbind(dummy) %>% mutate( p = as.numeric(p), name = stringr::str_replace(name, "-P-smooth", ""), name = factor(name, levels = stringr::str_replace(mnames[Msel], "-P-smooth", ""), ordered = T), `Loss differences` = `Loss differences` * 1000 ) %>% ggplot(aes(x = p, y = `Loss differences`, colour = name)) + geom_line(linewidth = 1) + theme_minimal() + theme( text = element_text(size = text_size), legend.position = "bottom" ) + xlab("Probability p") + scale_color_manual(NULL, values = tCOL) + guides(colour = guide_legend(nrow = 2, byrow = TRUE)) ``` ## Cumulative Loss Difference ```{r, echo = FALSE, fig.width=13, fig.height=5.5, fig.align='center', cache = TRUE} DQL <- t(apply(apply(QL[1:KK, -c(1:TTinit), ], c(1, 2), mean), 1, cumsum)) rownames(DQL) <- mnames t(DQL) %>% as_tibble() %>% select(Naive) %>% mutate( `Difference of cumulative loss` = 0, Date = ytime[-c(1:(TT + TTinit + 1))], name = "Naive" ) %>% select(-Naive) -> dummy data <- t(DQL) %>% as_tibble() %>% select(mnames[Msel]) %>% mutate(Date = ytime[-c(1:(TT + TTinit + 1))]) %>% pivot_longer(!Date & !Naive) %>% mutate(`Difference of cumulative loss` = value - Naive) %>% select(-value, -Naive) %>% rbind(dummy) %>% mutate( name = stringr::str_replace(name, "-P-smooth", ""), name = factor(name, levels = stringr::str_replace(mnames[Msel], "-P-smooth", "")) ) data %>% ggplot(aes(x = Date, y = `Difference of cumulative loss`, colour = name)) + geom_line(size = 1) + theme_minimal() + theme( text = element_text(size = text_size), legend.position = "bottom" ) + scale_color_manual(NULL, values = tCOL) + guides(colour = guide_legend(nrow = 2, byrow = TRUE)) ``` ## Weights (BOAG P-Smooth) ```{r, echo = FALSE, fig.width=13, fig.height=5.5, fig.align='center', cache = TRUE} load("assets/crps_learning/weights_data.RData") weights_data %>% ggplot(aes(Date, p, fill = w)) + geom_raster(interpolate = TRUE) + facet_grid(Mod ~ .) + theme_minimal() + theme( plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"), text = element_text(size = text_size), legend.key.height = unit(0.9, "inch") ) + ylab("p") + scale_fill_gradientn( limits = c(0, 1), colours = colseq, breaks = seq(0, 1, 0.2) ) + scale_x_date(expand = c(0, 0)) ``` ## Weights (Last) ```{r, echo = FALSE, fig.width=13, fig.height=5.5, fig.align='center', cache = TRUE} load("assets/crps_learning/weights_example.RData") weights %>% ggplot(aes(x = p, y = weights, col = Model)) + geom_line(size = 1.5) + theme_minimal() + theme( plot.margin = unit(c(0.2, 0.3, 0.2, 0.2), "cm"), text = element_text(size = text_size), legend.position = "bottom", legend.title = element_blank(), panel.spacing = unit(1.5, "lines") ) + scale_color_manual(NULL, values = tCOL[1:K]) + facet_grid(. ~ K) ``` :::: :::: # Multivariate Probabilistic CRPS Learning with an Application to Day-Ahead Electricity Prices {visibility="uncounted"} Berrisch, J., & Ziel, F. (2024). *International Journal of Forecasting*, 40(4), 1568-1586. ## Multivariate CRPS Learning :::: {.columns} ::: {.column width="45%"} We extend the **B-Smooth** and **P-Smooth** procedures to the multivariate setting: ::: {.panel-tabset} ## Penalized Smoothing Let $\boldsymbol{\psi}^{\text{mv}}=(\psi_1,\ldots, \psi_{D})$ and $\boldsymbol{\psi}^{\text{pr}}=(\psi_1,\ldots, \psi_{P})$ be two sets of bounded basis functions on $(0,1)$: \begin{equation*} \boldsymbol w_{t,k} = \boldsymbol{\psi}^{\text{mv}} \boldsymbol{b}_{t,k} {\boldsymbol{\psi}^{pr}}' \end{equation*} with parameter matix $\boldsymbol b_{t,k}$. The latter is estimated to penalize $L_2$-smoothing which minimizes \begin{align} & \| \boldsymbol{\beta}_{t,d, k}' \boldsymbol{\varphi}^{\text{pr}} - \boldsymbol b_{t, d, k}' \boldsymbol{\psi}^{\text{pr}} \|^2_2 + \lambda^{\text{pr}} \| \mathcal{D}_{q} (\boldsymbol b_{t, d, k}' \boldsymbol{\psi}^{\text{pr}}) \|^2_2 + \nonumber \\ & \| \boldsymbol{\beta}_{t, p, k}' \boldsymbol{\varphi}^{\text{mv}} - \boldsymbol b_{t, p, k}' \boldsymbol{\psi}^{\text{mv}} \|^2_2 + \lambda^{\text{mv}} \| \mathcal{D}_{q} (\boldsymbol b_{t, p, k}' \boldsymbol{\psi}^{\text{mv}}) \|^2_2 \nonumber \end{align} with differential operator $\mathcal{D}_q$ of order $q$ [{{< fa calculator >}}]{style="color:var(--col_green_10);"} We have an analytical solution. ## Basis Smoothing Linear combinations of bounded basis functions: \begin{equation} \underbrace{\boldsymbol w_{t,k}}_{D \text{ x } P} = \sum_{j=1}^{\widetilde D} \sum_{l=1}^{\widetilde P} \beta_{t,j,l,k} \varphi^{\text{mv}}_{j} \varphi^{\text{pr}}_{l} = \underbrace{\boldsymbol \varphi^{\text{mv}}}_{D\text{ x }\widetilde D} \boldsymbol \beta_{t,k} \underbrace{{\boldsymbol\varphi^{\text{pr}}}'}_{\widetilde P \text{ x }P} \nonumber \end{equation} A popular choice: B-Splines $\boldsymbol \beta_{t,k}$ is calculated using a reduced regret matrix: $\underbrace{\boldsymbol r_{t,k}}_{\widetilde P \times \widetilde D} = \boldsymbol \varphi^{\text{pr}} \underbrace{\left({\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widetilde{\boldsymbol X}_{t},Y_t)- {\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widehat{\boldsymbol X}_{t},Y_t)\right)}_{\text{PxD}}\boldsymbol \varphi^{\text{mv}}$ If $\widetilde P = P$ it holds that $\boldsymbol \varphi^{pr} = \boldsymbol{I}$ (pointwise) For $\widetilde P = 1$ we receive constant weights :::: ::: ::: {.column width="2%"} ::: ::: {.column width="53%"} ```{r, fig.align="center", echo=FALSE, out.width = "1000px", cache = TRUE} knitr::include_graphics("assets/mcrps_learning/algorithm.svg") ``` ::: :::: ## Application :::: {.columns} ::: {.column width="48%"} #### Data - Day-Ahead electricity price forecasts from @marcjasz2022distributional - Produced using probabilistic neural networks - 24-dimensional distributional forecasts - Distribution assumptions: JSU and Normal - 8 experts (4 JSU, 4 Normal) - 27th Dec. 2018 to 31st Dec. 2020 (736 days) - We extract 99 quantiles (percentiles) ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} #### Setup Evaluation: Exclude first 182 observations Extensions: Penalized smoothing | Forgetting Tuning strategies: - Bayesian Fix - Sophisticated Baesian Search algorithm - Online - Dynamic based on past performance - Bayesian Online - First Bayesian Fix then Online Computation Time: ~30 Minutes ::: :::: ## Results :::: {.columns} ::: {.column width="55%"} ```{r, cache = TRUE} load("assets/mcrps_learning/naive_table_df.rds") table_naive <- naive_table_df %>% as_tibble() %>% head(1) %>% mutate_all(round, 4) %>% mutate_all(sprintf, fmt = "%#.3f") %>% kbl( bootstrap_options = "condensed", escape = FALSE, format = "html", booktabs = FALSE, align = c("c", rep("c", ncol(naive_table_df) - 1)) ) %>% kable_paper(full_width = TRUE) %>% row_spec(0:1, color = cols[9, "grey"]) %>% kable_styling(font_size = 16) for (i in 1:ncol(naive_table_df)) { table_naive <- table_naive %>% column_spec(i, background = ifelse( is.na(naive_table_df["stat", i, drop = TRUE][-ncol(naive_table_df)]), cols[5, "grey"], col_scale2( naive_table_df["stat", i, drop = TRUE][-ncol(naive_table_df)], rng_t ) ), bold = i == which.min(naive_table_df["loss", ]) ) } table_naive load("assets/mcrps_learning/performance_data.rds") i <- 1 j <- 1 for (j in 1:3) { for (i in seq_len(nrow(performance_loss_tibble))) { if (loss_and_dm[i, j, "p.val"] < 0.001) { performance_loss_tibble[i, 2 + j] <- paste0( performance_loss_tibble[i, 2 + j], '***' ) } else if (loss_and_dm[i, j, "p.val"] < 0.01) { performance_loss_tibble[i, 2 + j] <- paste0( performance_loss_tibble[i, 2 + j], '**' ) } else if (loss_and_dm[i, j, "p.val"] < 0.05) { performance_loss_tibble[i, 2 + j] <- paste0( performance_loss_tibble[i, 2 + j], '*' ) } else if (loss_and_dm[i, j, "p.val"] < 0.1) { performance_loss_tibble[i, 2 + j] <- paste0( performance_loss_tibble[i, 2 + j], '.' ) } else { performance_loss_tibble[i, 2 + j] <- paste0( performance_loss_tibble[i, 2 + j] ) } } } table_performance <- performance_loss_tibble %>% kbl( padding=-1L, col.names = c( 'Description', 'Parameter Tuning', 'BOA', 'ML-Poly', 'EWA' ), bootstrap_options = "condensed", # Dont replace any string, dataframe has to be valid latex code ... escape = FALSE, format = "html", align = c("l", "l", rep("c", ncol(performance_loss_tibble)-2)) ) %>% kable_paper(full_width = TRUE) %>% row_spec(0:nrow(performance_loss_tibble), color = cols[9, "grey"]) # %% for (i in 3:ncol(performance_loss_tibble)) { bold_cells <- rep(FALSE, times = nrow(performance_loss_tibble)) loss <- loss_and_dm[, i - 2, "loss"] table_performance <- table_performance %>% column_spec(i, background = c( col_scale2( loss_and_dm[, i - 2, "stat"], rng_t ) ), bold = loss == min(loss), ) } table_performance %>% kable_styling(font_size = 16) ``` ```{=html}

Coloring w.r.t. test statistic: <-5 -4 -3 -2 -1 0 1 2 3 4 >5

Significance denoted by: . p < 0.1; * p < 0.05; ** p < 0.01; *** p < 0.001;

``` ::: ::: {.column width = "45%"}
::: {.panel-tabset} ## Constant ```{r, fig.align="center", echo=FALSE, out.width = "400", cache = TRUE} knitr::include_graphics("assets/mcrps_learning/constant.svg") ``` ## Pointwise ```{r, fig.align="center", echo=FALSE, out.width = "400", cache = TRUE} knitr::include_graphics("assets/mcrps_learning/pointwise.svg") ``` ## B Constant PR ```{r, fig.align="center", echo=FALSE, out.width = "400", cache = TRUE} knitr::include_graphics("assets/mcrps_learning/constant_pr.svg") ``` ## B Constant MV ```{r, fig.align="center", echo=FALSE, out.width = "400", cache = TRUE} knitr::include_graphics("assets/mcrps_learning/constant_mv.svg") ``` ## Smooth.Forget ```{r, fig.align="center", echo=FALSE, out.width = "400", cache = TRUE} knitr::include_graphics("assets/mcrps_learning/smooth_best.svg") ``` :::: ::: :::: ## Results ::: {.panel-tabset} ## Chosen Parameters ```{r, warning=FALSE, fig.align="center", echo=FALSE, fig.width=12, fig.height=5.5, cache = TRUE} load("assets/mcrps_learning/pars_data.rds") pars_data %>% ggplot(aes(x = dates, y = value)) + geom_rect(aes( ymin = 0, ymax = value * 1.2, xmin = dates[1], xmax = dates[182], fill = "Burn-In" )) + geom_line(aes(color = name), linewidth = linesize, show.legend = FALSE) + scale_colour_manual( values = as.character(cols[5, c("pink", "amber", "green")]) ) + facet_grid(name ~ ., scales = "free_y", # switch = "both" ) + scale_y_continuous( trans = "log2", labels = scaleFUN ) + theme_minimal() + theme( # plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"), text = element_text(size = text_size), legend.key.width = unit(0.9, "inch"), legend.position = "none" ) + ylab(NULL) + xlab("date") + scale_fill_manual(NULL, values = as.character(cols[3, "grey"]) ) ``` ## Weights: Hour 16:00-17:00 ```{r, fig.align="center", echo=FALSE, fig.width=12, fig.height=5.5, cache = TRUE} load("assets/mcrps_learning/weights_h.rds") weights_h %>% ggplot(aes(date, q, fill = weight)) + geom_raster(interpolate = TRUE) + facet_grid( Expert ~ . # , labeller = labeller(Mod = mod_labs) ) + theme_minimal() + theme( # plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"), text = element_text(size = text_size), legend.key.height = unit(0.9, "inch") ) + scale_x_date(expand = c(0, 0)) + scale_fill_gradientn( oob = scales::squish, limits = c(0, 1), values = c(seq(0, 0.4, length.out = 8), 0.65, 1), colours = c( cols[8, "red"], cols[5, "deep-orange"], cols[5, "amber"], cols[5, "yellow"], cols[5, "lime"], cols[5, "light-green"], cols[5, "green"], cols[7, "green"], cols[9, "green"], cols[10, "green"] ), breaks = seq(0, 1, 0.1) ) + xlab("date") + ylab("probability") + scale_y_continuous(breaks = c(0.1, 0.5, 0.9)) ``` ## Weights: Median ```{r, fig.align="center", echo=FALSE, fig.width=12, fig.height=5.5, cache = TRUE} load("assets/mcrps_learning/weights_q.rds") weights_q %>% mutate(hour = as.numeric(hour) - 1) %>% ggplot(aes(date, hour, fill = weight)) + geom_raster(interpolate = TRUE) + facet_grid( Expert ~ . # , labeller = labeller(Mod = mod_labs) ) + theme_minimal() + theme( # plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"), text = element_text(size = text_size), legend.key.height = unit(0.9, "inch") ) + scale_x_date(expand = c(0, 0)) + scale_fill_gradientn( oob = scales::squish, limits = c(0, 1), values = c(seq(0, 0.4, length.out = 8), 0.65, 1), colours = c( cols[8, "red"], cols[5, "deep-orange"], cols[5, "amber"], cols[5, "yellow"], cols[5, "lime"], cols[5, "light-green"], cols[5, "green"], cols[7, "green"], cols[9, "green"], cols[10, "green"] ), breaks = seq(0, 1, 0.1) ) + xlab("date") + ylab("hour") + scale_y_continuous(breaks = c(0, 8, 16, 24)) ``` :::: ## Non-Equidistant Knots ::: {.panel-tabset} ## Knot Placement Illustration ```{ojs} bsplineData = FileAttachment("assets/mcrps_learning/basis_functions.csv").csv({ typed: true }) ``` ```{ojs} // Defined above // function updateChartInner(g, x, y, linesGroup, color, line, data) { // // Update axes with transitions // x.domain([0, d3.max(data, d => d.x)]); // g.select(".x-axis").transition().duration(1500).call(d3.axisBottom(x).ticks(10)); // y.domain([0, d3.max(data, d => d.y)]); // g.select(".y-axis").transition().duration(1500).call(d3.axisLeft(y).ticks(5)); // // Group data by basis function // const dataByFunction = Array.from(d3.group(data, d => d.b)); // const keyFn = d => d[0]; // // Update basis function lines // const u = linesGroup.selectAll("path").data(dataByFunction, keyFn); // u.join( // enter => enter.append("path").attr("fill","none").attr("stroke-width",3) // .attr("stroke", (_, i) => color(i)).attr("d", d => line(d[1].map(pt => ({x: pt.x, y: 0})))) // .style("opacity",0), // update => update, // exit => exit.transition().duration(1000).style("opacity",0).remove() // ) // .transition().duration(1000) // .attr("d", d => line(d[1])) // .attr("stroke", (_, i) => color(i)) // .style("opacity",1); // } chart0 = { // State variables for selected parameters let selectedMu = 0.5; let selectedSig = 1; let selectedNonc = 0; let selectedTailw = 1; const filteredData = () => bsplineData.filter(d => Math.abs(selectedMu - d.mu) < 0.001 && d.sig === selectedSig && d.nonc === selectedNonc && d.tailw === selectedTailw ); const container = d3.create("div") .style("max-width", "none") .style("width", "100%");; const controlsContainer = container.append("div") .style("display", "flex") .style("gap", "20px"); // slider controls const sliders = [ { label: 'Mu', get: () => selectedMu, set: v => selectedMu = v, min: 0.1, max: 0.9, step: 0.2 }, { label: 'Sigma', get: () => Math.log2(selectedSig), set: v => selectedSig = 2 ** v, min: -2, max: 2, step: 1 }, { label: 'Noncentrality', get: () => selectedNonc, set: v => selectedNonc = v, min: -4, max: 4, step: 2 }, { label: 'Tailweight', get: () => Math.log2(selectedTailw), set: v => selectedTailw = 2 ** v, min: -2, max: 2, step: 1 } ]; // Build slider controls with D3 data join const sliderCont = controlsContainer.selectAll('div').data(sliders).join('div') .style('display','flex').style('align-items','center').style('gap','10px') .style('flex','1').style('min-width','0px'); sliderCont.append('label').text(d => d.label + ':').style('font-size','20px'); sliderCont.append('input') .attr('type','range').attr('min', d => d.min).attr('max', d => d.max).attr('step', d => d.step) .property('value', d => d.get()) .on('input', function(event, d) { const val = +this.value; d.set(val); d3.select(this.parentNode).select('span').text(d.label.match(/Sigma|Tailweight/) ? 2**val : val); updateChart(filteredData()); }) .style('width', '100%'); sliderCont.append('span').text(d => (d.label.match(/Sigma|Tailweight/) ? d.get() : d.get())) .style('font-size','20px'); // Add Reset button to clear all sliders to their defaults controlsContainer.append('button') .text('Reset') .style('font-size', '20px') .style('align-self', 'center') .style('margin-left', 'auto') .on('click', () => { // reset state vars selectedMu = 0.5; selectedSig = 1; selectedNonc = 0; selectedTailw = 1; // update input positions sliderCont.selectAll('input').property('value', d => d.get()); // update displayed labels sliderCont.selectAll('span') .text(d => d.label.match(/Sigma|Tailweight/) ? (2**d.get()) : d.get()); // redraw chart updateChart(filteredData()); }); // Build SVG const width = 1200; const height = 450; const margin = {top: 40, right: 20, bottom: 40, left: 40}; const innerWidth = width - margin.left - margin.right; const innerHeight = height - margin.top - margin.bottom; // Set controls container width to match SVG plot width controlsContainer.style("max-width", "none").style("width", "100%"); // Distribute each control evenly and make sliders full-width controlsContainer.selectAll("div").style("flex", "1").style("min-width", "0px"); controlsContainer.selectAll("input").style("width", "100%").style("box-sizing", "border-box"); // Create scales const x = d3.scaleLinear() .domain([0, 1]) .range([0, innerWidth]); const y = d3.scaleLinear() .domain([0, 1]) .range([innerHeight, 0]); // Create a color scale for the basis functions const color = d3.scaleOrdinal(d3.schemeCategory10); // Create SVG const svg = d3.create("svg") .attr("width", "100%") .attr("height", "auto") .attr("viewBox", [0, 0, width, height]) .attr("preserveAspectRatio", "xMidYMid meet") .attr("style", "max-width: 100%; height: auto;"); // Create the chart group const g = svg.append("g") .attr("transform", `translate(${margin.left},${margin.top})`); // Add axes const xAxis = g.append("g") .attr("transform", `translate(0,${innerHeight})`) .attr("class", "x-axis") .call(d3.axisBottom(x).ticks(10)) .style("font-size", "20px"); const yAxis = g.append("g") .attr("class", "y-axis") .call(d3.axisLeft(y).ticks(5)) .style("font-size", "20px"); // Add a horizontal line at y = 0 g.append("line") .attr("x1", 0) .attr("x2", innerWidth) .attr("y1", y(0)) .attr("y2", y(0)) .attr("stroke", "#000") .attr("stroke-opacity", 0.2); // Add gridlines g.append("g") .attr("class", "grid-lines") .selectAll("line") .data(y.ticks(5)) .join("line") .attr("x1", 0) .attr("x2", innerWidth) .attr("y1", d => y(d)) .attr("y2", d => y(d)) .attr("stroke", "#ccc") .attr("stroke-opacity", 0.5); // Create a line generator const line = d3.line() .x(d => x(d.x)) .y(d => y(d.y)) .curve(d3.curveBasis); // Group to contain the basis function lines const linesGroup = g.append("g") .attr("class", "basis-functions"); // Store the current basis functions for transition let currentBasisFunctions = new Map(); // Function to update the chart with new data function updateChart(data) { updateChartInner(g, x, y, linesGroup, color, line, data); } // Store the update function svg.node().update = updateChart; // Initial render updateChart(filteredData()); container.node().appendChild(svg.node()); return container.node(); } ``` ## Knot Placement Details :::: {.columns} ::: {.column width="48%"} Non-central beta distribution @johnson1995continuous: ::: {style="font-size: 70%;"} \begin{equation*} \mathcal{B}(x, a, b, c) = \sum_{j=0}^{\infty} e^{-c/2} \frac{\left( \frac{c}{2} \right)^j}{j!} I_x \left( a + j , b \right) \end{equation*} :::: ```{r, fig.align="center", echo=FALSE, out.width = "1000px", cache = TRUE} knitr::include_graphics("assets/mcrps_learning/knot_placement.svg") ``` Penalty and $\lambda$ need to be adjusted accordingly @li2022general ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} Using non equidistant knots in `profoc` is straightforward: ```{r, echo = TRUE, eval = FALSE, cache = TRUE} mod <- online( y = Y, experts = experts, tau = 1:99 / 100, b_smooth_pr = list( knots = 9, mu = 0.3, sigma = 1, nonc = 0, tailweight = 1, deg = 3 ) ) ``` Basis specification `b_smooth_pr` is internally passed to `make_basis_mats()`. Profoc adjusts penatly and $\lambda$ ::: :::: :::: ## Wrap-Up :::: {.columns} ::: {.column width="48%"} [{{< fa triangle-exclamation >}}]{style="color:var(--col_red_9);"} Potential Downsides: - Pointwise optimization can induce quantile crossing - Can be solved by sorting the predictions [{{< fa magnifying-glass >}}]{style="color:var(--col_orange_9);"} Important: - The choice of the learning rate is crucial - The loss function has to meet certain criteria [{{< fa rocket >}}]{style="color:var(--col_green_9);"} Upsides: - Pointwise learning outperforms the Naive solution significantly - Online learning is much faster than batch methods - Smoothing further improves the predictive performance - Asymptotically not worse than the best convex combination ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} The [ profoc](https://profoc.berrisch.biz/) R Package: - Implements all algorithms discussed above - Is written using RcppArmadillo its fast - Accepts vectors for most parameters - The best parameter combination is chosen online - Implements - Forgetting, Fixed Share - Different loss functions + gradients Pubications: Berrisch, J., & Ziel, F. [-@BERRISCH2023105221]. CRPS learning. *Journal of Econometrics*, 237(2), 105221. Berrisch, J., & Ziel, F. [-@BERRISCH20241568]. Multivariate probabilistic CRPS learning with an application to day-ahead electricity prices. *International Journal of Forecasting*, 40(4), 1568-1586. ::: :::: # Modeling Volatility and Dependence of European Carbon and Energy Prices {#sec-voldep visibility="uncounted"} Berrisch, J., Pappert, S., Ziel, F., & Arsova, A. (2023). *Finance Research Letters*, 52, 103503. --- :::: {.columns} ::: {.column width="48%"} ### Motivation Understanding European Allowances (EUA) dynamics is important for several fields: Portfolio & Risk Management, Sustainability Planing Political decisions EUA prices are connected to energy markets How can the dynamics be characterized? Several Questions arise: - Data (Pre)processing - Modeling Approach - Evaluation ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} ### Data Daily Observations: 03/15/2010 - 10/14/2022 EUA, Natural Gas, Brent Crude Oil, Coal - normalized w.r.t. $\text{CO}_2$ emissions - Adjusted for inflation by Eurostat's HICP, *excluding energy* Emission-adjusted prices reflect one tonne of $\text{CO}_2$ Log transformation of the data to stabilize the variance ADF Test: All series are stationary in first differences Johansen’s likelihood ratio trace test suggests two cointegrating relationships (only in levels) ::: :::: ## Data ```{r, echo=FALSE, fig.width = 12, fig.height = 6, fig.align="center", cache = TRUE} readr::read_csv("assets/voldep/2022_10_14_eur_ref_co2_adj_hvpi_ex_nrg.csv") %>% select(-EUR_USD, -hvpi_x_nrg) %>% pivot_longer(-Date) %>% mutate(name = factor(name, levels = c("EUA", "Oil", "NGas", "Coal") )) %>% ggplot(aes(x = Date, y = value, color = name)) + geom_line( linewidth = linesize ) + # The ggplot2 call got a little simpler now :) geom_vline( # Beginning of the war xintercept = as.Date("2022-02-24"), col = as.character(cols[8, "grey"]), linetype = "dashed" ) + ylab("Prices EUR") + xlab("Date") + scale_color_manual( values = as.character(c( cols[8, col_eua], cols[8, col_oil], cols[8, col_gas], cols[8, col_coal] )), labels = c( "EUA", "Oil", "NGas", "Coal" ) ) + guides(color = guide_legend(override.aes = list(size = 2))) + theme_minimal() + theme( legend.position = "bottom", # legend.key.size = unit(1.5, "cm"), legend.title = element_blank(), text = element_text( size = text_size ) ) + scale_x_date( breaks = as.Date( paste0( as.character(seq(2010, 2022, 2)), "-01-01" ) ), date_labels = "%Y" ) + xlab(NULL) + scale_y_continuous(trans = "log2") ``` ## Modeling Approach: Notation
:::: {.columns} ::: {.column width="48%"} - Let $\boldsymbol{X}_t$ be a $K$-dimensional vector at time $t$ - The forecasting target: - Conditional joint distribution - $F_{\boldsymbol{X}_t|\mathcal{F}_{t-1}}$ - $\mathcal{F}_{t}$ is the sigma field generated by all information available up to and including time $t$ Sklars theorem: decompose target into - marginal distributions: $F_{X_{k,t}|\mathcal{F}_{t-1}}$ for $k=1,\ldots, K$, and - copula function: $C_{\boldsymbol{U}_{t}|\mathcal{F}_{t - 1}}$ ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} Let $\boldsymbol{x}_t= (x_{1,t},\ldots, x_{K,t})^\intercal$ be the realized values It holds that: \begin{align} F_{\boldsymbol{X}_t|\mathcal{F}_{t-1}}(\boldsymbol{x}_t) = C_{\boldsymbol{U}_{t}|\mathcal{F}_{t - 1}}(\boldsymbol{u}_t) \nonumber \end{align} with: $\boldsymbol{u}_t =(u_{1,t},\ldots, u_{K,t})^\intercal$, $u_{k,t} = F_{X_{k,t}|\mathcal{F}_{t-1}}(x_{k,t})$ For brewity we drop the conditioning on $\mathcal{F}_{t-1}$. The model can be specified as follows \begin{align} F(\boldsymbol{x}_t) = C \left[\mathbf{F}(\boldsymbol{x}_t; \boldsymbol{\mu}_t, \boldsymbol{ \sigma }_{t}^2, \boldsymbol{\nu}, \boldsymbol{\lambda}); \Xi_t, \Theta\right] \nonumber \end{align} $\Xi_{t}$ denotes time-varying dependence parameters $\Theta$ denotes time-invariant dependence parameters We take $C$ as the $t$-copula ::: :::: ## Modeling Approach: The General Framework
:::: {.columns} ::: {.column width="48%"} ### Individual marginal distributions: $$\mathbf{F} = (F_1, \ldots, F_K)^{\intercal}$$ Generalized non-central t-distributions - Time varying: expectation $\boldsymbol{\mu}_t = (\mu_{1,t}, \ldots, \mu_{K,t})^{\intercal}$ - variance: $\boldsymbol{\sigma}_{t}^2 = (\sigma_{1,t}^2, \ldots, \sigma_{K,t}^2)^{\intercal}$ - Time invariant - degrees of freedom: $\boldsymbol{\nu} = (\nu_1, \ldots, \nu_K)^{\intercal}$ - noncentrality: $\boldsymbol{\lambda} = (\lambda_1, \ldots, \lambda_K)^{\intercal}$ ### VECM Model \begin{align} \Delta \boldsymbol{\mu}_t = \Pi \boldsymbol{x}_{t-1} + \Gamma \Delta \boldsymbol{x}_{t-1} \nonumber \end{align} where $\Pi = \alpha \beta^{\intercal}$ is the cointegrating matrix of rank $r$, $0 \leq r\leq K$. ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} ### GARCH model \begin{align} \sigma_{i,t}^2 = & \omega_i + \alpha^+_{i} (\epsilon_{i,t-1}^+)^2 + \alpha^-_{i} (\epsilon_{i,t-1}^-)^2 + \beta_i \sigma_{i,t-1}^2 \nonumber \end{align} where $\epsilon_{i,t-1}^+ = \max\{\epsilon_{i,t-1}, 0\}$ ... Positive vs. negative innovations (capture leverage effects). ### Time-varying dependence parameters \begin{align*} \Xi_{t} = & \Lambda\left(\boldsymbol{\xi}_{t}\right) \\ \xi_{ij,t} = & \eta_{0,ij} + \eta_{1,ij} \xi_{ij,t-1} + \eta_{2,ij} z_{i,t-1} z_{j,t-1}, \end{align*} $z_{i,t}$ is the $i$-th standardized residual from time series $i$ $\Lambda(\cdot)$ is a link function: - ensures that $\Xi_{t}$ is a valid variance covariance matrix - ensures that $\Xi_{t}$ does not exceed its support space and remains semi-positive definite ::: :::: ## Study Design and Evaluation
:::: {.columns} ::: {.column width="48%"} ### Rolling-window forecasting study - 3257 observations total - Window size: 1000 days (~ four years) - We sample 250 of 2227 starting points - We draw $2^{12}= 2048$ trajectories 30 steps ahead ### Estimation Joint maximum lieklihood estimation: \begin{align*} f_{\mathbf{X}_t}(\mathbf{x}_t | \mathcal{F}_{t-1}) = c\left[\mathbf{F}(\mathbf{x}_t;\boldsymbol{\mu}_t, \boldsymbol{\sigma}_{t}^2, \boldsymbol{\nu}, \boldsymbol{\lambda});\Xi_t, \Theta\right] \cdot \\ \prod_{i=1}^K f_{X_{i,t}}(\mathbf{x}_t;\boldsymbol{\mu}_t, \boldsymbol{\sigma}_{t}^2, \boldsymbol{\nu}, \boldsymbol{\lambda}) \end{align*} The copula density $c$ can be derived analytically. ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} ### Evaluation Our main objective is the Energy Score (ES) \begin{align*} \text{ES}_t(F, \mathbf{x}_t) = \mathbb{E}_{F} \left(||\tilde{\mathbf{X}}_t - \mathbf{x}_t||_2\right) - \\ \frac{1}{2} \mathbb{E}_F \left(||\tilde{\mathbf{X}}_t - \tilde{\mathbf{X}}_t'||_2 \right) \end{align*} where $\mathbf{x}_t$ is the observed $K$-dimensional realization and $\tilde{\mathbf{X}}_t$, respectively $\tilde{\mathbf{X}}_t'$ are independent random vectors distributed according to $F$ For univariate cases the Energy Score becomes the Continuous Ranked Probability Score (CRPS) ::: :::: ## Results ::: {.panel-tabset} ## Energy Scores :::: {.columns} ::: {.column width="55%"} Relative improvement in ES compared to $\text{RW}^{\sigma, \rho}$ Cellcolor: w.r.t. test statistic of Diebold-Mariano test (wether the model outperformes the benchmark, greener = better). ```{r, echo=FALSE, results='asis', width = 'revert-layer', cache = TRUE} load("assets/voldep/energy_df.Rdata") table_energy <- energy %>% kbl( digits = 2, col.names = c( "Model", "\$\\text{ES}^{\\text{All}}_{1-30}\$", "\$\\text{ES}^{\\text{EUA}}_{1-30}\$", "\$\\text{ES}^{\\text{Oil}}_{1-30}\$", "\$\\text{ES}^{\\text{NGas}}_{1-30}\$", "\$\\text{ES}^{\\text{Coal}}_{1-30}\$", "\$\\text{ES}^{\\text{All}}_{1}\$", "\$\\text{ES}^{\\text{All}}_{5}\$", "\$\\text{ES}^{\\text{All}}_{30}\$" ), # linesep = "", bootstrap_options = "condensed", # Dont replace any string, dataframe has to be valid latex code ... escape = TRUE, format = "html", align = c("l", rep("r", ncol(energy) - 1)) ) %>% kable_paper(full_width = TRUE) %>% row_spec(0:nrow(energy), color = cols[9, "grey"]) for (i in 2:ncol(energy)) { bold_cells <- rep(FALSE, times = nrow(energy)) if (all(energy[-1, i, drop = TRUE] < 0)) { bold_cells[1] <- TRUE } else { bold_cells[which.max(energy[-1, i, drop = TRUE]) + 1] <- TRUE } table_energy <- table_energy %>% column_spec(i, background = c( cols[5, "grey"], col_scale2( energy_dm[, i, drop = TRUE][-1], rng_t ) ), bold = bold_cells ) } table_energy %>% kable_styling( bootstrap_options = c("condensed"), font_size = 14 ) ``` ```{=html}

Coloring w.r.t. test statistic: <-5 -4 -3 -2 -1 0 1 2 3 4 >5

``` ::: ::: {.column width="4%"} ::: ::: {.column width="41%"} - Benchmarks: - $\text{RW}^{\sigma, \rho}$: Random walk with constant volatility and correlation - Univariate $\text{ETS}^{\sigma}$ with constant volatility - Vector ETS $VES^{\sigma}$ with constant volatility - Heteroscedasticity is a main driver of ES - The VECM model without cointegration (VAR) is the best performing model in terms of ES overall - For EUA, the ETS Benchmark is the best performing model in terms of ES ::: :::: ## CRPS Scores :::: {.columns} ::: {.column width="28%"} - CRPS solely evaluates the marginal distributions - The cross-sectional dependence is ignored - VES models deliver poor performance in short horizons - For Oil prices the RW Benchmark can't be oupterformed 30 steps ahead - Both VECM models generally deliver good performance ::: ::: {.column width="4%"} ::: ::: {.column width="68%"} Relative improvement in CRPS compared to $\text{RW}^{\sigma, \rho}$ ```{r, echo=FALSE, results = 'asis', cache = TRUE} load("assets/voldep/crps_df.Rdata") table_crps <- crps %>% kbl( col.names = c("Model", rep(paste0("H", h_sub), 4)), digits = 1, linesep = "", # Dont replace any string, dataframe has to be valid latex code ... escape = FALSE, format = "html", booktabs = FALSE, align = c("l", rep("r", ncol(crps) - 1)) ) %>% kable_paper(full_width = FALSE) for (i in 2:ncol(crps)) { bold_cells <- rep(FALSE, times = nrow(crps)) if (all(crps[-1, i, drop = TRUE] < 0)) { bold_cells[1] <- TRUE } else { bold_cells[which.max(crps[-1, i, drop = TRUE]) + 1] <- TRUE } table_crps <- table_crps %>% column_spec(i, background = c(cols[5, "grey"], col_scale2( crps_dm[, i, drop = TRUE][-1], rng_t )), bold = bold_cells ) } table_crps <- table_crps %>% add_header_above(c(" ", "EUA" = 3, "Oil" = 3, "NGas" = 3, "Coal" = 3)) %>% row_spec(0:nrow(crps), color = cols[9, "grey"]) table_crps %>% kable_styling( bootstrap_options = c("condensed"), full_width = FALSE, font_size = 16 ) ``` ```{=html}

Coloring w.r.t. test statistic: <-5 -4 -3 -2 -1 0 1 2 3 4 >5

``` ::: :::: ## RMSE :::: {.columns} ::: {.column width="28%"} RMSE measures the performance of the forecasts at their mean Some models beat the benchmarks at short horizons Conclusion: the Improvements seen before must be attributed to other parts of the multivariate predictive distribution ::: ::: {.column width="4%"} ::: ::: {.column width="68%"} Relative improvement in RMSE compared to $\text{RW}^{\sigma, \rho}$ ```{r, echo=FALSE, results = 'asis', cache = TRUE} load("assets/voldep/rmsq_df.Rdata") table_rmsq <- rmsq %>% kbl( col.names = c("Model", rep(paste0("H", h_sub), 4)), digits = 1, # Dont replace any string, dataframe has to be valid latex code ... escape = FALSE, linesep = "", format = "html", align = c("l", rep("r", ncol(rmsq) - 1)) ) %>% kable_paper(full_width = FALSE) for (i in 2:ncol(rmsq)) { bold_cells <- rep(FALSE, times = nrow(rmsq)) if (all(rmsq[-1, i, drop = TRUE] < 0)) { bold_cells[1] <- TRUE } else { bold_cells[which.max(rmsq[-1, i, drop = TRUE]) + 1] <- TRUE } table_rmsq <- table_rmsq %>% column_spec(i, background = c(cols[5, "grey"], col_scale2( rmsq_dm[, i, drop = TRUE][-1], rng_t )), bold = bold_cells ) } table_rmsq <- table_rmsq %>% add_header_above(c(" ", "EUA" = 3, "Oil" = 3, "NGas" = 3, "Coal" = 3)) %>% row_spec(0:nrow(rmsq), color = cols[9, "grey"]) table_rmsq %>% kable_styling( bootstrap_options = c("condensed"), full_width = FALSE, font_size = 14 ) ``` ```{=html}

Coloring w.r.t. test statistic: <-5 -4 -3 -2 -1 0 1 2 3 4 >5

``` ::: :::: ## Evolution of Linear Dependence $\Xi$ ```{r, echo=FALSE, fig.width = 12, fig.height = 5.5, fig.align="center", cache = TRUE} load("assets/voldep/plot_rho_df.Rdata") ggplot() + geom_line( size = linesize, data = plot_data[plot_data$name == "5Oil-Coal", ], aes(x = idx, y = value, col = name) ) + geom_line( size = linesize, data = plot_data[plot_data$name == "2EUA-NGas", ], aes(x = idx, y = value, col = name) ) + geom_line( size = linesize, data = plot_data[plot_data$name == "1EUA-Oil", ], aes(x = idx, y = value, col = name) ) + geom_line( size = linesize, data = plot_data[plot_data$name == "3EUA-Coal", ], aes(x = idx, y = value, col = name) ) + geom_line( size = linesize, data = plot_data[plot_data$name == "6NGas-Coal", ], aes(x = idx, y = value, col = name) ) + geom_line( size = linesize, data = plot_data[plot_data$name == "4Oil-NGas", ], aes(x = idx, y = value, col = name) ) + geom_vline( xintercept = as.Date("2022-02-24"), # Beginning of the war col = as.character(cols[8, "grey"]), linetype = "dashed" ) + geom_vline( xintercept = as.Date("2022-02-28"), # Central bank of russia was blocked from access to foreign reserves col = as.character(cols[8, "grey"]), linetype = "dotted" ) + scale_colour_manual( values = as.character( cols[ 6, c("green", "purple", "blue", "red", "orange", "brown") ] ), labels = c( paste0(varnames[comb[, 1] + 1], "-", varnames[comb[, 2] + 1]) ) ) + theme_minimal() + theme( zoom.x = element_rect(fill = cols[4, "grey"], colour = NA), legend.position = "bottom", text = element_text(size = text_size) ) + ylab("Correlation") + scale_x_date( breaks = date_breaks_fct, labels = date_labels_fct ) + xlab(NULL) + guides(col = guide_legend( title = NULL, nrow = 1, byrow = TRUE, override.aes = list(size = 2) )) + # scale_y_continuous(breaks = seq(-1.5, 1.0, 0.5)) + ggforce::facet_zoom( xlim = c( as.Date("2022-02-02"), as.Date("2022-04-30") ), zoom.size = 1.5, show.area = TRUE ) ``` ## Predictive Quantiles ```{r, echo=FALSE, fig.width = 12, fig.height = 5.5, fig.align="center", cache = TRUE} load("assets/voldep/plot_quant_df.Rdata") plot_quant_data %>% ggplot(aes(x = date, y = value)) + geom_line(size = 1, aes(col = quant)) + geom_vline( xintercept = as.Date("2022-02-24"), # Beginning of the war col = as.character(cols[8, "grey"]), linetype = "dashed" ) + geom_vline( xintercept = as.Date("2022-02-28"), # Central bank of russia was blocked from access to foreign reserves col = as.character(cols[8, "grey"]), linetype = "dotted" ) + scale_colour_manual( values = as.character( c( cols[ 5, c(1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15) ], cols[9, 18] ) ) ) + facet_grid(var ~ ., scales = "free_y") + theme_minimal() + theme( legend.position = "bottom", text = element_text(size = text_size), strip.background = element_rect( fill = cols[1, 18], colour = cols[1, 18] ) ) + ylab("EUR") + scale_x_date( breaks = (seq( as.Date("2022-02-15"), as.Date("2022-03-31"), by = "4 day" ) + 1)[-12], date_labels = "%b %d", limits = c(as.Date("2022-02-15"), as.Date("2022-03-31")) ) + xlab(NULL) + guides(col = guide_legend( title = "Quantiles [\\%]", nrow = 2, byrow = TRUE, override.aes = list(size = 2) )) + scale_y_continuous( trans = "log2", breaks = y_breaks ) ``` :::: ## Conclusion :::: {.columns} ::: {.column width="48%"} Accounting for heteroscedasticity or stabilizing the variance via log transformation is crucial for good performance in terms of ES - Price dynamics emerged way before the russian invaion into ukraine - Linear dependence between the series reacted only right after the invasion - Improvements in forecasting performance is mainly attributed to: - the tails - the dependence structure between the marginals ::: ::: {.column width="4%"} ::: ::: {.column width="48%"}

Berrisch, J., Pappert, S., Ziel, F., & Arsova, A. [-@berrisch2023modeling]. Modeling volatility and dependence of European carbon and energy prices. Finance Research Letters, 52, 103503. ::: :::: # Final Remarks {visibility="uncounted"} ## Contributions {#sec-contributions} :::: {.columns} ::: {.column width="48%"} **Theoretical** Probabilistic Online Learning: Aggregation
Regression

**Practical** Applications Energy Commodities
Electricity Prices
Electricity Load

**Well received by the academic community:** of papers already published 104 citations since 2020 ([Google Scholar](https://scholar.google.com/citations?user=wfF1oJYAAAAJ&hl=de&oi=sra)) ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} **Software** R Packages: [profoc](https://profoc.berrisch.biz/), [rcpptimer](https://rcpptimer.berrisch.biz/), [dccpp](https://dccpp.berrisch.biz/) Python Packages: [ondil](https://github.com/simon-hirsch/ondil), [sstudentt](https://github.com/BerriJ/sstudentt) Contributions to other projects: [RcppArmadillo](https://github.com/RcppCore/RcppArmadillo)
[gamlss](https://github.com/gamlss-dev/gamlss)
[NixOS/nixpkgs](https://github.com/NixOS/nixpkgs)
[OpenPrinting/foomatic-db](https://github.com/OpenPrinting/foomatic-db)
**Awards:** Berrisch, J., Narajewski, M., & Ziel, F. [-@BERRISCH2023100236]: Won Western Power Distribution Competition
Won Best-Student-Presentation Award ::: :::: ## Questions! {visibility="uncounted"} ![Artwork by [\@allison_horst](https://allisonhorst.com/)](assets/allisonhorst/hiding.png) ## References {visibility="uncounted"}