Init presentation boilerplate
This commit is contained in:
696
25_07_phd_defense/index.qmd
Normal file
696
25_07_phd_defense/index.qmd
Normal file
@@ -0,0 +1,696 @@
|
||||
---
|
||||
title: "Data Science Methods for Forecasting in Energy and Economics"
|
||||
date: 2025-07-10
|
||||
author:
|
||||
- name: Jonathan Berrisch
|
||||
affiliations:
|
||||
- ref: hemf
|
||||
affiliations:
|
||||
- id: hemf
|
||||
name: University of Duisburg-Essen, House of Energy Markets and Finance
|
||||
format:
|
||||
revealjs:
|
||||
embed-resources: true
|
||||
footer: ""
|
||||
logo: logos_combined.png
|
||||
theme: [default, clean.scss]
|
||||
smaller: true
|
||||
fig-format: svg
|
||||
execute:
|
||||
daemon: false
|
||||
highlight-style: github
|
||||
---
|
||||
|
||||
## Outline
|
||||
|
||||
::: {.hidden}
|
||||
$$
|
||||
\newcommand{\A}{{\mathbb A}}
|
||||
$$
|
||||
:::
|
||||
|
||||
<br>
|
||||
|
||||
::: {style="font-size: 150%;"}
|
||||
|
||||
[{{< fa bars-staggered >}}]{style="color: #404040;"}   Introduction & Research Motivation
|
||||
|
||||
[{{< fa bars-staggered >}}]{style="color: #404040;"}   Overview of the Thesis
|
||||
|
||||
[{{< fa table >}}]{style="color: #404040;"}   Online Learning
|
||||
|
||||
[{{< fa circle-nodes >}}]{style="color: #404040;"}   Probabilistic Forecasting of European Carbon and Energy Prices
|
||||
|
||||
[{{< fa lightbulb >}}]{style="color: #404040;"}   Limitations
|
||||
|
||||
[{{< fa binoculars >}}]{style="color: #404040;"}   Contributions & Outlook
|
||||
|
||||
:::
|
||||
|
||||
## EfeMOD
|
||||
|
||||
**Empirisch fundierte Elektrizitätsmarkt-Modellierung mit Open Data**
|
||||
|
||||
:::: {.columns}
|
||||
|
||||
::: {.column width="65%"}
|
||||
|
||||
[{{< fa users-gear >}}]{style="color: #404040;"} **Project Entities:**
|
||||
|
||||
Chair of Prof. Dr. Christoph Weber (Management Sciences and Energy Economics)
|
||||
|
||||
Chair of Prof. Dr. Florian Ziel (Data Science in Energy and Environment)
|
||||
|
||||
[{{< fa bullseye >}}]{style="color: #404040;"}   **Project Goal:**
|
||||
|
||||
Use publicly available data (particularly ENTSO-E Transparency Platform) to estimate parameters for energy system and energy market models.
|
||||
|
||||
:::
|
||||
|
||||
::: {.column width="5%"}
|
||||
<!-- empty column to create gap -->
|
||||
:::
|
||||
|
||||
::: {.column width="30%"}
|
||||

|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
## EfeMOD
|
||||
|
||||

|
||||
|
||||
## Motivation and Objective
|
||||
|
||||
**Identification of Power Plant Operation States Using Clustering**
|
||||
|
||||
[{{< fa earth-europe >}}]{style="color: #404040;"} Gain Knowledge about the Power Plant Characteristics
|
||||
|
||||
- Operation Points,
|
||||
- Efficiency
|
||||
- Capacity, etc.
|
||||
|
||||
[{{< fa display >}}]{style="color: #404040;"} This Presentation:
|
||||
|
||||
Identify Operation States:
|
||||
|
||||
- Stable Operation
|
||||
- Startup
|
||||
- Minimum-Stable Operation, etc.
|
||||
|
||||
Provide these characteristics to other researchers
|
||||
|
||||
[{{< fa right-long >}}]{style="color: #404040;"} e.g. to estimate efficiency
|
||||
|
||||
## Data
|
||||
|
||||
[{{< fa database >}}]{style="color:#404040;"} Entsoe Data:
|
||||
|
||||
- ActualGenerationOutputPerGenerationUnit_16.1.A
|
||||
- UnavailabilityOfGenerationUnits_15.1.A_B
|
||||
|
||||
[{{< fa fire-flame-simple >}}]{style="color:rgb(0, 200, 255);"} We focus on natural gas units:
|
||||
|
||||
- 63 units in `DE_LU` bidding zone
|
||||
- 299 units across all bidding zones
|
||||
|
||||
[{{< fa calendar-days >}}]{style="color:#404040;"} We use recent data:
|
||||
|
||||
- 2020-01-01 until "now"
|
||||
|
||||
## Data
|
||||
|
||||

|
||||
|
||||
## Data
|
||||
|
||||

|
||||
|
||||
## Data
|
||||
|
||||
|
||||
::: {.panel-tabset}
|
||||
|
||||
## Lausward
|
||||
|
||||
:::: {.columns}
|
||||
|
||||
::: {.column width="42%"}
|
||||
|
||||
**Heizkraftwerk Lausward **
|
||||
|
||||
Location: Düsseldorf
|
||||
|
||||
Block Anton (*Block AGuD*)
|
||||
|
||||
Combined cycle gas turbine (CCGT)
|
||||
|
||||
Electrical output: 103 MW [{{< fa bolt >}}]{style="color: #ffc400;"}
|
||||
|
||||
75 MW of district heating can be decoupled
|
||||
|
||||
Efficiency: 54%
|
||||
|
||||
Fuel Utilization Rate: 87% (with district heating)
|
||||
|
||||
:::
|
||||
|
||||
::: {.column width="3%"}
|
||||
<!-- empty column to create gap -->
|
||||
:::
|
||||
|
||||
::: {.column width="55%"}
|
||||
|
||||

|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
## Emsland
|
||||
|
||||
:::: {.columns}
|
||||
|
||||
::: {.column width="42%"}
|
||||
|
||||
**Erdgaskraftwerk Emsland**
|
||||
|
||||
Location: Lingen (Ems)
|
||||
|
||||
*Block C*
|
||||
|
||||
Combined cycle gas turbine (CCGT)
|
||||
|
||||
Electrical output: 475 MW [{{< fa bolt >}}]{style="color: #ffc400;"}
|
||||
|
||||
Efficiency: 46%
|
||||
|
||||
Black start enabled.
|
||||
|
||||
:::
|
||||
|
||||
::: {.column width="3%"}
|
||||
<!-- empty column to create gap -->
|
||||
:::
|
||||
|
||||
::: {.column width="55%"}
|
||||
|
||||

|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
:::
|
||||
|
||||
|
||||
|
||||
## Empirical Approach
|
||||
|
||||
:::: {.columns}
|
||||
|
||||
::: {.column width="42%"}
|
||||
|
||||
### Overview
|
||||
|
||||
Empirical identification of states
|
||||
|
||||
3-Step Approach:
|
||||
|
||||
- Prior Partitioning
|
||||
- We create preliminary clusters
|
||||
- They will be used to initialize the main clustering
|
||||
- Main Clustering
|
||||
- Gaussian Model Based Clustering
|
||||
- Label Assignment
|
||||
- We assign meaningful labels to the final clusters
|
||||
|
||||
:::
|
||||
|
||||
::: {.column width="3%"}
|
||||
<!-- empty column to create gap -->
|
||||
:::
|
||||
|
||||
::: {.column width="55%"}
|
||||
|
||||

|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
:::
|
||||
|
||||
## Empirical Approach
|
||||
|
||||
:::: {.columns}
|
||||
|
||||
::: {.column width="42%"}
|
||||
|
||||
### Prior Partitioning
|
||||
|
||||
[{{< fa arrow-up-right-dots >}}]{style="color: #202020FF;"} Divide the space in meaningful partitions:
|
||||
|
||||
Define the Capacity: $\zeta = max(t0)$
|
||||
|
||||
Define a threshold: $\gamma = \frac{\zeta}{50}$
|
||||
|
||||
[{{< fa circle >}}]{style="color: #2D7D32FF;"} $\pm \gamma$ around the diagonal: Stable <br>
|
||||
[{{< fa circle >}}]{style="color: #202020FF;"} $t0 < 1$ & $t1 < 1$: Zero <br>
|
||||
[{{< fa circle >}}]{style="color: #FA8C00FF;"} $t0 < \gamma$ & $t1 > 1$: Startup <br>
|
||||
[{{< fa circle >}}]{style="color: #D81A5FFF;"} $t0 > 1$ & $t1 < \gamma$: Shutdown <br>
|
||||
[{{< fa circle >}}]{style="color: #FDD834FF;"} $t1 > t0$: Ramp-Up <br>
|
||||
[{{< fa circle >}}]{style="color: #8D24AAFF;"} $t1 < t0$: Ramp-Down
|
||||
|
||||
We project <b style="color: #2D7D32FF;">Stable</b> observations onto the diagonal, <font style = "opacity: 0.4;"> <b style="color: #FA8C00FF;">Startup</b> on $t1$ and <b style="color: #D81A5FFF;">Shutdown</b> on $t0$ for the next step. </font>
|
||||
|
||||
:::
|
||||
|
||||
::: {.column width="3%"}
|
||||
<!-- empty column to create gap -->
|
||||
:::
|
||||
|
||||
::: {.column width="55%"}
|
||||
|
||||

|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
:::
|
||||
|
||||
## Empirical Approach
|
||||
|
||||
:::: {.columns}
|
||||
|
||||
::: {.column width="42%"}
|
||||
|
||||
### Prior Partitioning
|
||||
|
||||
Model-Based Clustering of the Regions using `mclust::Mclust` in `R`.
|
||||
|
||||
- <b style="color: #2D7D32FF;">Stable</b>: 2-5 Clusters
|
||||
- <b style="color: #FDD834FF;">Ramp Up</b>: 2-4 Clusters
|
||||
- <b style="color: #8D24AAFF;">Ramp Down</b>: 2-4 Clusters
|
||||
|
||||
[{{< fa lightbulb >}}]{style="color:rgb(255, 166, 0);"} Obtain finite mixture distribution:
|
||||
|
||||
$$\sum_{k=1}^{G}{\pi_k f_k (\mathbf{x}; \mathbf{\theta}_k)}$$
|
||||
|
||||
$f_k$ Density of k's component<br>
|
||||
$\pi_k$ Mixture weights<br>
|
||||
$\theta_k$ parameters of k's density component
|
||||
|
||||
|
||||
:::
|
||||
|
||||
::: {.column width="3%"}
|
||||
<!-- empty column to create gap -->
|
||||
:::
|
||||
|
||||
::: {.column width="55%"}
|
||||
|
||||

|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
## Empirical Approach
|
||||
|
||||
### Prior Partitioning
|
||||
:::: {.columns}
|
||||
|
||||
::: {.column width="49%"}
|
||||
|
||||
$$f(\mathbf{x}; \mathbf{\Psi}) = \sum_{k=1}^{G}{\pi_k \phi (\mathbf{x}; \mathbf{\mu}_k; \mathbf{\Sigma}_k)}$$
|
||||
|
||||
$\phi(\cdot)$ Multivariate Gaussian density<br>
|
||||
|
||||
Maximum Likelihood Estimation via Expectation Maximization (EM) algorithm
|
||||
|
||||
Likelihood for Gaussian Mixture Models (GMMs):
|
||||
|
||||
\begin{align}
|
||||
\ell(\Psi) = \sum_{i=1}^n \log \left\{ \sum_{k=1}^G \pi_k \phi(x_i; \mu_k, \Sigma_k) \right\}
|
||||
\end{align}
|
||||
|
||||
[{{< fa retweet >}}]{style="color: #404040;"} We Re-Formulate this likelihood to a complete-data likelihood to utilize the EM algorithm
|
||||
|
||||
|
||||
:::
|
||||
|
||||
::: {.column width="2%"}
|
||||
<!-- empty column to create gap -->
|
||||
:::
|
||||
|
||||
::: {.column width="49%"}
|
||||
|
||||
\begin{align}
|
||||
\ell_{\mathcal{C}}(\Psi) = \sum_{i=1}^n \sum_{k=1}^G z_{ik} \left\{ \log \pi_k + \log \phi(x_i; \mu_k, \Sigma_k) \right\}
|
||||
\end{align}
|
||||
|
||||
\begin{align}
|
||||
z_{ik} =
|
||||
\begin{cases}
|
||||
1 & \text{if } x_i \text{ belongs to component }k \\
|
||||
0 & \text{otherwise.}
|
||||
\end{cases}
|
||||
\end{align}
|
||||
|
||||
E-Step:
|
||||
|
||||
\begin{align}
|
||||
\hat{z}_{ik} = \frac{\hat{\pi}_k \phi(x_i; \hat{\mu}_k, \hat{\Sigma}_k)}{\sum_{g=1}^{G} \hat{\pi}_g \phi(x_i; \hat{\mu}_g, \hat{\Sigma}_g)},
|
||||
\end{align}
|
||||
|
||||
M-Step:
|
||||
|
||||
\begin{align}
|
||||
\quad \hat{\mu}_k = \frac{\sum_{i=1}^{n} \hat{z}_{ik} x_i}{n_k}, \quad \text{where} \quad n_k = \sum_{i=1}^{n} \hat{z}_{ik}.
|
||||
\end{align}
|
||||
|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
|
||||
|
||||
::: {.notes}
|
||||
|
||||
- log-likelihood in (2.2) is hard to maximize directly
|
||||
- even numerically
|
||||
|
||||
- As a consequence, mixture models are usually fitted by reformulating the mixture
|
||||
problem as an incomplete-data problem within the EM framework.
|
||||
|
||||
General EM Steps:
|
||||
|
||||
- Init
|
||||
- Estimate latent component memberships
|
||||
- M-Step obtain the updated parameter estimates
|
||||
- Check convergence criteria
|
||||
|
||||
:::
|
||||
|
||||
## Empirical Approach
|
||||
|
||||
:::: {.columns}
|
||||
|
||||
::: {.column width="42%"}
|
||||
|
||||
### Prior Partitioning
|
||||
|
||||
**Initialization**
|
||||
|
||||
We initialize the EM algorithm (E-Step) using the partitions
|
||||
obtained from model-based agglomerative hierarchical clustering (MBAHC)
|
||||
|
||||
**Estimation**
|
||||
|
||||
The Bayesian information criterion (BIC) is used for model selection
|
||||
|
||||
**Prior Partitioning Results**
|
||||
|
||||
Right graph shows prior clusters.
|
||||
|
||||
:::
|
||||
|
||||
::: {.column width="3%"}
|
||||
<!-- empty column to create gap -->
|
||||
:::
|
||||
|
||||
::: {.column width="55%"}
|
||||
|
||||
::: {.panel-tabset}
|
||||
|
||||
## Lausward
|
||||
|
||||

|
||||
|
||||
## Emsland
|
||||
|
||||

|
||||
|
||||
:::
|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
::: {.notes}
|
||||
|
||||
recursively merging the two clusters that yield the maximum
|
||||
likelihood of a probability model over all possible merges
|
||||
|
||||
:::
|
||||
|
||||
## Empirical Approach
|
||||
|
||||
:::: {.columns}
|
||||
|
||||
::: {.column width="42%"}
|
||||
|
||||
### Main Clustering
|
||||
|
||||
**MBAHC**
|
||||
|
||||
Prior Clusters are used in MBAHC
|
||||
|
||||
The results of the MBAHC are used to initialize the EM Algorithm in the main Gaussian Model Based Clustering
|
||||
|
||||
**Main Clustering Results**
|
||||
|
||||
Right graph shows *Maximum A Posteriori (MAP) Classification*
|
||||
|
||||
Colour indicates cumulated log(density) of all components.
|
||||
|
||||
:::
|
||||
|
||||
::: {.column width="3%"}
|
||||
<!-- empty column to create gap -->
|
||||
:::
|
||||
|
||||
::: {.column width="55%"}
|
||||
|
||||
::: {.panel-tabset}
|
||||
|
||||
## Lausward
|
||||
|
||||

|
||||
|
||||
## Emsland
|
||||
|
||||

|
||||
|
||||
:::
|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
::: {.notes}
|
||||
|
||||
recursively merging the two clusters that yield the maximum
|
||||
likelihood of a probability model over all possible merges
|
||||
|
||||
:::
|
||||
|
||||
|
||||
|
||||
## Empirical Approach
|
||||
|
||||
### Label Assignment
|
||||
|
||||
:::: {.columns}
|
||||
|
||||
::: {.column width="48%"}
|
||||
|
||||
We assign labels to the clusters using their mean $\mu$ and correlation $\rho$
|
||||
|
||||
Multiple clusters may describe one Generation State (e.g., along the diagonal)
|
||||
|
||||
:::
|
||||
|
||||
::: {.column width="4%"}
|
||||
<!-- empty column to create gap -->
|
||||
:::
|
||||
|
||||
::: {.column width="48%"}
|
||||
|
||||
```{r}
|
||||
library(dplyr)
|
||||
load("figures/Block AGuD/clusters.RDS")
|
||||
clusters %>%
|
||||
select(classification, mu_t0, mu_t1, cor) %>%
|
||||
head()
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
\begin{align}
|
||||
\text{State} =
|
||||
\begin{cases}
|
||||
\color{#202020FF}{\text{Zero}} & (\mu_{t0} < 1) \land (\mu_{t1} < 1), \\
|
||||
\text{MSO} & \left[ (\mu_{t0} > \zeta/10) \land (\mu_{t1} > \zeta / 10) \land (\right| \mu_{t0} - \mu_{t1} \left| > \zeta / 10) \right]\\ & \rightarrow \operatorname{argmin}(\mu_{t0} + \mu_{t1}), \\
|
||||
\text{Max Capacity} & \rightarrow \operatorname{argmax}(\mu_{t0} + \mu_{t1}), \\
|
||||
\text{Startup} & (\mu_{t1} \geq \zeta / 10) \land (\mu_{t0} < \gamma) \land (\rho < 0.3), \\
|
||||
\text{Shutdown} & (\mu_{t0} \geq \zeta / 10) \land (\mu_{t1} < \gamma) \land (\rho < 0.3), \\
|
||||
\text{Stable Operation} & \text{Remaining clusters with cor} > 0.8, \\
|
||||
\text{Ramp Up} & \text{Remaining clusters: } \mu_{t1} > \mu_{t0}, \\
|
||||
\text{Ramp Down} & \text{Remaining clusters: } \mu_{t1} < \mu_{t0}.
|
||||
\end{cases}
|
||||
\end{align}
|
||||
|
||||
::: {.notes}
|
||||
|
||||
recursively merging the two clusters that yield the maximum
|
||||
likelihood of a probability model over all possible merges
|
||||
|
||||
:::
|
||||
|
||||
## Empirical Approach
|
||||
|
||||
:::: {.columns}
|
||||
|
||||
::: {.column width="39%"}
|
||||
|
||||
### Label Assignment
|
||||
|
||||
Right graphs show *assigned states*
|
||||
|
||||
The points are coloured according to
|
||||
|
||||
- MAP
|
||||
- Probability (each pure colour reflects a probability of 1)
|
||||
|
||||
Some points below /above the diagonal are assigned to Ramp Up / Ramp Down
|
||||
|
||||
- Can be easily fixed for MAP
|
||||
- Fixing probabilistic predictions not that easy
|
||||
|
||||
:::
|
||||
|
||||
::: {.column width="2%"}
|
||||
<!-- empty column to create gap -->
|
||||
:::
|
||||
|
||||
::: {.column width="59%"}
|
||||
|
||||
::: {.panel-tabset}
|
||||
|
||||
## LSW
|
||||
|
||||

|
||||
|
||||
## LSW Pr
|
||||
|
||||

|
||||
|
||||
## LSW Pr
|
||||
|
||||

|
||||
|
||||
## EMS
|
||||
|
||||

|
||||
|
||||
## EMS Pr
|
||||
|
||||

|
||||
|
||||
## EMS Pr
|
||||
|
||||

|
||||
|
||||
:::
|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
::: {.notes}
|
||||
|
||||
recursively merging the two clusters that yield the maximum
|
||||
likelihood of a probability model over all possible merges
|
||||
|
||||
:::
|
||||
|
||||
## Empirical Approach
|
||||
|
||||
:::: {.columns}
|
||||
|
||||
::: {.column width="42%"}
|
||||
|
||||
### Label Assignment
|
||||
|
||||
*Fixing assignments*
|
||||
|
||||
Relabeling Ramp Up and Ramp Down MAP predictions is trivial:
|
||||
|
||||
\begin{align}
|
||||
\text{State} =
|
||||
\begin{cases}
|
||||
\text{Ramp Up} & x_{t1} > x_{t0}, \\
|
||||
\text{Ramp Down} & x_{t1} < x_{t0}.
|
||||
\end{cases}
|
||||
\end{align}
|
||||
|
||||
Fixing the probability array is more involved:
|
||||
|
||||
Find observations $x_{t1} < x_{t0}$ that can not be "Ramp Up":
|
||||
|
||||
Set probability of all Ramp Up clusters to $0$.
|
||||
|
||||
Normalize the probabilities.
|
||||
|
||||
:::
|
||||
|
||||
::: {.column width="3%"}
|
||||
<!-- empty column to create gap -->
|
||||
:::
|
||||
|
||||
::: {.column width="55%"}
|
||||
|
||||
::: {.panel-tabset}
|
||||
|
||||
|
||||
## LSW Pr
|
||||
|
||||

|
||||
|
||||
## LSW Pr
|
||||
|
||||

|
||||
|
||||
## EMS Pr
|
||||
|
||||

|
||||
|
||||
## EMS Pr
|
||||
|
||||

|
||||
|
||||
:::
|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
## Outlook
|
||||
|
||||
<br>
|
||||
<br>
|
||||
|
||||
- The approach works in general
|
||||
- Conceptually simple
|
||||
- Label assignment needs some more work
|
||||
- Probabilistic statements may need adjustments for Ramp-Up Ramp-Down predictions
|
||||
- Some kind of validation would be desirable
|
||||
- Results will be used party on another research project in the EFEMOD project
|
||||
Reference in New Issue
Block a user