Table of Contents
Fetching ...

Robust Stochastic Optimal Control via variance penalization: Application to Energy Management Systems

Paul Malisani, Adrien Spagnol, Vivien Smis-Michel

TL;DR

The paper tackles robust stochastic optimal control under convexity, addressing the optimizer's curse by introducing a variance-penalized objective and a Douglas–Rachford–based solver, the Variance-Penalized Progressive Hedging Algorithm ($\mathrm{VPPHA}$). It develops a data-driven framework consisting of scenario generation and reduction to enable tractable, scenario-based control, and proves convergence of the VPPHA to the optimal solution under convexity. The approach is instantiated in a rolling-horizon energy management system for a stationary battery, using real consumption and production data to compare against MPC and standard PHA; results show that VPPHA yields superior out-of-sample performance and greater bill reductions, especially during volatile pricing periods. The work demonstrates that variance penalization can enhance robustness without increasing computational burden, offering a practical path to robust EMS deployments through scalable, scenario-based optimization. Together, these contributions advance robust stochastic control with scalable algorithms and data-driven uncertainty representation for energy systems.

Abstract

This paper addresses a class of robust stochastic optimal control problems. Its main contribution lies in the introduction of a general optimization model with variance penalization and an associated solution algorithm that improves out-of-sample robustness while preserving numerical complexity. The proposed variance-penalized model is inspired by a well-established machine learning practice that aims to limit overfitting and extends this idea to stochastic optimal control. Using the Douglas--Rachford splitting method, the authors develop a Variance-Penalized Progressive Hedging Algorithm (VPPHA) that retains the computational complexity of the standard PHA while achieving superior out-of-sample performance. In addition, the authors propose a three-step control framework comprising (i) a random scenario generation method, (ii) a scenario reduction algorithm, and (iii) a scenario-based optimal control computation using the VPPHA. Finally, the proposed method is validated through simulations of a stationary battery Energy Management System (EMS) using ground-truth electricity consumption and production measurements from a predominantly commercial building in Solaize, France. The results demonstrate that the proposed approach outperforms a classical Model Predictive Control (MPC) strategy, which itself performs better than the standard PHA.

Robust Stochastic Optimal Control via variance penalization: Application to Energy Management Systems

TL;DR

The paper tackles robust stochastic optimal control under convexity, addressing the optimizer's curse by introducing a variance-penalized objective and a Douglas–Rachford–based solver, the Variance-Penalized Progressive Hedging Algorithm (). It develops a data-driven framework consisting of scenario generation and reduction to enable tractable, scenario-based control, and proves convergence of the VPPHA to the optimal solution under convexity. The approach is instantiated in a rolling-horizon energy management system for a stationary battery, using real consumption and production data to compare against MPC and standard PHA; results show that VPPHA yields superior out-of-sample performance and greater bill reductions, especially during volatile pricing periods. The work demonstrates that variance penalization can enhance robustness without increasing computational burden, offering a practical path to robust EMS deployments through scalable, scenario-based optimization. Together, these contributions advance robust stochastic control with scalable algorithms and data-driven uncertainty representation for energy systems.

Abstract

This paper addresses a class of robust stochastic optimal control problems. Its main contribution lies in the introduction of a general optimization model with variance penalization and an associated solution algorithm that improves out-of-sample robustness while preserving numerical complexity. The proposed variance-penalized model is inspired by a well-established machine learning practice that aims to limit overfitting and extends this idea to stochastic optimal control. Using the Douglas--Rachford splitting method, the authors develop a Variance-Penalized Progressive Hedging Algorithm (VPPHA) that retains the computational complexity of the standard PHA while achieving superior out-of-sample performance. In addition, the authors propose a three-step control framework comprising (i) a random scenario generation method, (ii) a scenario reduction algorithm, and (iii) a scenario-based optimal control computation using the VPPHA. Finally, the proposed method is validated through simulations of a stationary battery Energy Management System (EMS) using ground-truth electricity consumption and production measurements from a predominantly commercial building in Solaize, France. The results demonstrate that the proposed approach outperforms a classical Model Predictive Control (MPC) strategy, which itself performs better than the standard PHA.

Paper Structure

This paper contains 21 sections, 3 theorems, 43 equations, 6 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Let ${\boldsymbol{\lambda}}^0 \in {\mathcal{N}_\delta^\bot}$, let $r>0$, and let $\alpha \geq 0$. If $f$ is convex, proper, and lower semi-continuous, the following sequence weakly converges to a fixed-point $(\bar{{\boldsymbol{x}}}, \bar{{\boldsymbol{\lambda}}}, \bar{{\boldsymbol{z}}})$ such that $\bar{{\boldsymbol{x}}}$ is an optimal solution of prlbm:general_stoc_multistage_problem.

Figures (6)

  • Figure 1: Quantiles curves obtained for $\alpha \in \{0.01, 0.5, 0.99\}$ for electrical production.
  • Figure 2: Quantiles curves obtained for $\alpha \in \{0.01, 0.5, 0.99\}$ for electrical consumption. The pics in consumption are due to the occasional operations of the building's glass factory.
  • Figure 3: schematic diagram of a domestic system with a stationary battery controlled by an EMS
  • Figure 4: Influence of the weighting parameter $\alpha$ on the performance ratio $\eta(\alpha)$ with an actualization period $H = 24$ hours and a scenario tree of 225-scenarios.
  • Figure 5: time-evolution of the performance ratio $\eta(\alpha)$ from the 2022-01-22 to the 2024-01-22.
  • ...and 1 more figures

Theorems & Definitions (9)

  • Definition 1: $\delta$-adaptation
  • Theorem 1: Variance-Penalized PHA
  • proof
  • Remark 1
  • Definition 2
  • Theorem 2
  • proof
  • Lemma 1
  • proof