- The forecaster receives predictions \(\widehat{X}_{t,k}\) from \(K\) experts -
- The forecaster assings weights \(w_{t,k}\) to each expert +
- The forecaster assigns weights \(w_{t,k}\) to each expert
- The forecaster calculates her prediction: \[\begin{equation}
\widetilde{X}_{t} = \sum_{k=1}^K w_{t,k} \widehat{X}_{t,k}.
\label{eq_forecast_def}
@@ -26275,7 +26273,7 @@ w_{t,k}^{\text{Naive}} = \frac{1}{K}\label{eq:naive_combination}
\mathcal{O}\left(\sqrt{\frac{\log(K)}{t}}\right)
\label{eq_optp_conv}
\end{align}\]
-
Algorithms can statisfy both \(\eqref{eq_optp_select}\) and \(\eqref{eq_optp_conv}\) depending on:
+Algorithms can satisfy both \(\eqref{eq_optp_select}\) and \(\eqref{eq_optp_conv}\) depending on:
- The loss function
- Regularity conditions on \(Y_t\) and \(\widehat{X}_{t,k}\) @@ -26325,7 +26323,7 @@ w_{t,k}^{\text{Naive}} = \frac{1}{K}\label{eq:naive_combination}
@@ -26515,7 +26513,7 @@ w_{t,k}^{\text{smooth}} = \sum_{l=1}^L \beta_l \varphi_l = \beta'\varphiQL is convex, but not exp-concave
-Bernstein Online Aggregation (BOA) lets us weaken the exp-concavity condition. It satisfies that there exist a \(C>0\) such that for \(x>0\) it holds that
+Bernstein Online Aggregation (BOA) lets us weaken the exp-concavity condition. It satisfies that there exists a \(C>0\) such that for \(x>0\) it holds that
\[\begin{equation} P\left( \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\pi} \right) \leq C \log(\log(t)) \left(\sqrt{\frac{\log(K)}{t}} + \frac{\log(K)+x}{t}\right) \right) \geq 1-e^{-x} @@ -26333,7 +26331,7 @@ w_{t,k}^{\text{Naive}} = \frac{1}{K}\label{eq:naive_combination} \end{equation}\]
if the loss function is convex.
Almost optimal w.r.t. convex aggregation \(\eqref{eq_optp_conv}\) Wintenberger (2017).
-The same algorithm satisfies that there exist a \(C>0\) such that for \(x>0\) it holds that \[\begin{equation} +
The same algorithm satisfies that there exists a \(C>0\) such that for \(x>0\) it holds that \[\begin{equation} P\left( \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\min} \right) \leq C\left(\frac{\log(K)+\log(\log(Gt))+ x}{\alpha t}\right)^{\frac{1}{2-\beta}} \right) \geq 1-2e^{-x} @@ -26358,7 +26356,7 @@ w_{t,k}^{\text{Naive}} = \frac{1}{K}\label{eq:naive_combination} \end{align}\]
Pointwise can outperform constant procedures
\(\text{QL}\) is convex: almost optimal convergence w.r.t. convex aggregation \(\eqref{eq_boa_opt_conv}\)
-For almost optimal congerence w.r.t. selection \(\eqref{eq_boa_opt_select}\) we need:
+For almost optimal convergence w.r.t. selection \(\eqref{eq_boa_opt_select}\) we need:
A1: Lipschitz Continuity
A2: Weak Exp-Concavity
QL is Lipschitz continuous with \(G=\max(p, 1-p)\):
@@ -26395,7 +26393,7 @@ w_{t,k}^{\text{Naive}} = \frac{1}{K}\label{eq:naive_combination}Theorem 1
The gradient based fully adaptive Bernstein online aggregation (BOAG) applied pointwise for all \(p\in(0,1)\) on \(\text{QL}\) satisfies \(\eqref{eq_boa_opt_conv}\) with minimal CRPS given by
\[\widehat{\mathcal{R}}_{t,\pi} = 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\pi}.\]
-If \(Y_t|\mathcal{F}_{t-1}\) is bounded and has a pdf \(f_t\) satifying \(f_t>\gamma >0\) on its support \(\text{spt}(f_t)\) then \(\eqref{eq_boa_opt_select}\) holds with \(\beta=1\) and
+If \(Y_t|\mathcal{F}_{t-1}\) is bounded and has a pdf \(f_t\) satisfying \(f_t>\gamma >0\) on its support \(\text{spt}(f_t)\) then \(\eqref{eq_boa_opt_select}\) holds with \(\beta=1\) and
\[\widehat{\mathcal{R}}_{t,\min} = 2\overline{\widehat{\mathcal{R}}}^{\text{QL}}_{t,\min}\]
BOAG with \(\text{QL}\) satisfies \(\eqref{eq_boa_opt_conv}\) and \(\eqref{eq_boa_opt_select}\)
Initialization:
-Array of expert predicitons: \(\widehat{X}_{t,p,k}\)
+Array of expert predictions: \(\widehat{X}_{t,p,k}\)
Vector of Prediction targets: \(Y_t\)
Starting Weights: \(\boldsymbol w_0=(w_{0,1},\ldots, w_{0,K})\)
Penalization parameter: \(\lambda\geq 0\)
@@ -26536,8 +26534,10 @@ w_{t,k}^{\text{smooth}} = \sum_{l=1}^L \beta_l \varphi_l = \beta'\varphi\(\boldsymbol \eta_{t} =\min\left( \left(-\log(\boldsymbol \beta_{0}) \odot \boldsymbol V_{t}^{\odot -1} \right)^{\odot\frac{1}{2}} , \frac{1}{2}\boldsymbol E_{t}^{\odot-1}\right)\)
\(\boldsymbol R_{t} = \boldsymbol R_{t-1}+ \boldsymbol r_{t} \odot \left( \boldsymbol 1 - \boldsymbol \eta_{t} \odot \boldsymbol r_{t} \right)/2 + \boldsymbol E_{t} \odot \mathbb{1}\{-2\boldsymbol \eta_{t}\odot \boldsymbol r_{t} > 1\}\)
\(\boldsymbol \beta_{t} = K \boldsymbol \beta_{0} \odot \boldsymbol {SoftMax}\left( - \boldsymbol \eta_{t} \odot \boldsymbol R_{t} + \log( \boldsymbol \eta_{t}) \right)\)
-\(\boldsymbol w_{t}(\boldsymbol P) = \underbrace{\boldsymbol B(\boldsymbol B'\boldsymbol B+ \lambda (\alpha \boldsymbol D_1'\boldsymbol D_1 + (1-\alpha) \boldsymbol D_2'\boldsymbol D_2))^{-1} \boldsymbol B'}_{\boldsymbol{\mathcal{H}}} \boldsymbol B \boldsymbol \beta_{t}\)
-}
+ \(\boldsymbol w_{t}(\boldsymbol P) = \underbrace{\boldsymbol B(\boldsymbol B'\boldsymbol B+ \lambda (\alpha \boldsymbol D_1'\boldsymbol D_1 + (1-\alpha) \boldsymbol D_2'\boldsymbol D_2))^{-1} \boldsymbol B'}_{\boldsymbol{\mathcal{H}}} \boldsymbol B \boldsymbol \beta_{t}\) ++} +
Deviation from best attainable QL (1000 runs).
CRPS Values for different \(\lambda\) (1000 runs)
CRPS for different number of knots (1000 runs)
The same simulation carried out for different algorithms (1000 runs):
Tuning paramter grids:
+Tuning parameter grids:
- Smoothing Penalty: \(\Lambda= \{0\}\cup \{2^x|x\in \{-4,-3.5,\ldots,12\}\}\)
- Learning Rates: \(\mathcal{E}= \{2^x|x\in \{-1,-0.5,\ldots,9\}\}\) @@ -26857,7 +26853,7 @@ Y_{t} = \mu + Y_{t-1} + \varepsilon_t \quad \text{with} \quad \varepsilon_t = \
- We draw \(2^{12}= 2048\) trajectories 30 steps ahead
- The cross-sectional dependence is ignored
- VES models deliver poor performance in short horizons -
- For Oil prices the RW Benchmark can’t be oupterformed 30 steps ahead +
- For Oil prices the RW Benchmark can’t be outperformed 30 steps ahead
- Both VECM models generally deliver good performance
- Price dynamics emerged way before the russian invaion into ukraine +
- Price dynamics emerged way before the Russian invasion into Ukraine
- Linear dependence between the series reacted only right after the invasion
- Improvements in forecasting performance is mainly attributed to:
-
diff --git a/index.qmd b/index.qmd
index 11054b5..61f1bcd 100644
--- a/index.qmd
+++ b/index.qmd
@@ -173,11 +173,7 @@ col_yellow <- "#FCE135"
::::
-## Overview of the Thesis {#sec-overview}
-
-::: {.r-stack}
-
-::: {.fragment .fade-in-then-out}
+## Overview of the Thesis {#sec-overview transition="fade" transition-speed="slow"}
-::: - -::: {.fragment .fade-in-then-out} +## Overview of the Thesis {transition="fade" transition-speed="slow" visibility="uncounted"}@@ -238,9 +234,7 @@ col_yellow <- "#FCE135"
-::: - -::: {.fragment .fade-in-then-out} +## Overview of the Thesis {transition="fade" transition-speed="slow" visibility="uncounted"}@@ -301,9 +295,7 @@ col_yellow <- "#FCE135"
-::: - -::: ## Overview @@ -1730,8 +1719,7 @@ for( t in 1:T ) { $\boldsymbol \beta_{t} = K \boldsymbol \beta_{0} \odot \boldsymbol {SoftMax}\left( - \boldsymbol \eta_{t} \odot \boldsymbol R_{t} + \log( \boldsymbol \eta_{t}) \right)$ $\boldsymbol w_{t}(\boldsymbol P) = \underbrace{\boldsymbol B(\boldsymbol B'\boldsymbol B+ \lambda (\alpha \boldsymbol D_1'\boldsymbol D_1 + (1-\alpha) \boldsymbol D_2'\boldsymbol D_2))^{-1} \boldsymbol B'}_{\boldsymbol{\mathcal{H}}} \boldsymbol B \boldsymbol \beta_{t}$ - -} +@@ -364,9 +356,6 @@ col_yellow <- "#FCE135" }
::: @@ -1775,20 +1763,14 @@ Data Generating Process of the [simple probabilistic example](#simple_example): ## QL Deviation -Deviation from best attainable QL (1000 runs). - - + ## CRPS vs. Lambda -CRPS Values for different $\lambda$ (1000 runs) -  ## Knots -CRPS for different number of knots (1000 runs) -  :::: @@ -1799,8 +1781,6 @@ CRPS for different number of knots (1000 runs) ## Comparison to EWA and ML-Poly -The same simulation carried out for different algorithms (1000 runs): -
\[\begin{equation*} \boldsymbol w_{t,k} = \boldsymbol{\psi}^{\text{mv}} \boldsymbol{b}_{t,k} {\boldsymbol{\psi}^{pr}}' \end{equation*}\]
-with parameter matix \(\boldsymbol b_{t,k}\). The latter is estimated to penalize \(L_2\)-smoothing which minimizes
+with parameter matrix \(\boldsymbol b_{t,k}\). The latter is estimated to penalize \(L_2\)-smoothing which minimizes
\[\begin{align}
& \| \boldsymbol{\beta}_{t,d, k}' \boldsymbol{\varphi}^{\text{pr}} - \boldsymbol b_{t, d, k}' \boldsymbol{\psi}^{\text{pr}} \|^2_2 + \lambda^{\text{pr}} \| \mathcal{D}_{q} (\boldsymbol b_{t, d, k}' \boldsymbol{\psi}^{\text{pr}}) \|^2_2 + \nonumber \\
& \| \boldsymbol{\beta}_{t, p, k}' \boldsymbol{\varphi}^{\text{mv}} - \boldsymbol b_{t, p, k}' \boldsymbol{\psi}^{\text{mv}} \|^2_2 + \lambda^{\text{mv}} \| \mathcal{D}_{q} (\boldsymbol b_{t, p, k}' \boldsymbol{\psi}^{\text{mv}}) \|^2_2 \nonumber
@@ -27230,7 +27226,7 @@ Y_{t} = \mu + Y_{t-1} + \varepsilon_t \quad \text{with} \quad \varepsilon_t = \
with: \(\boldsymbol{u}_t =(u_{1,t},\ldots, u_{K,t})^\intercal\), \(u_{k,t} = F_{X_{k,t}|\mathcal{F}_{t-1}}(x_{k,t})\) For brewity we drop the conditioning on \(\mathcal{F}_{t-1}\). For brevity we drop the conditioning on \(\mathcal{F}_{t-1}\). The model can be specified as follows \[\begin{align}
F(\boldsymbol{x}_t) = C \left[\mathbf{F}(\boldsymbol{x}_t; \boldsymbol{\mu}_t, \boldsymbol{ \sigma }_{t}^2, \boldsymbol{\nu}, \boldsymbol{\lambda}); \Xi_t, \Theta\right] \nonumber
@@ -27662,7 +27658,7 @@ Y_{t} = \mu + Y_{t-1} + \varepsilon_t \quad \text{with} \quad \varepsilon_t = \
Joint maximum lieklihood estimation: Joint maximum likelihood estimation: \[\begin{align*}
f_{\mathbf{X}_t}(\mathbf{x}_t | \mathcal{F}_{t-1}) = c\left[\mathbf{F}(\mathbf{x}_t;\boldsymbol{\mu}_t, \boldsymbol{\sigma}_{t}^2, \boldsymbol{\nu},
\boldsymbol{\lambda});\Xi_t, \Theta\right] \cdot \\ \prod_{i=1}^K f_{X_{i,t}}(\mathbf{x}_t;\boldsymbol{\mu}_t, \boldsymbol{\sigma}_{t}^2, \boldsymbol{\nu}, \boldsymbol{\lambda})
@@ -27858,7 +27854,7 @@ Y_{t} = \mu + Y_{t-1} + \varepsilon_t \quad \text{with} \quad \varepsilon_t = \
Accounting for heteroscedasticity or stabilizing the variance via log transformation is crucial for good performance in terms of ESEstimation
-
-