Table of Contents
Fetching ...

Quantum Reinforcement Learning-Based Two-Stage Unit Commitment Framework for Enhanced Power Systems Robustness

Xiang Wei, Ziqing Zhu, Linghua Zhu, Ze Hu, Xian Zhang, Guibin Wang, Siqi Bu, Ka Wing Chan

TL;DR

This work addresses robust unit commitment (UC) under high renewable uncertainty by proposing a two-stage foresight-seeing UC framework that uses virtual power plants (VPPs) as flexible resources and leverages quantum reinforcement learning (QRL) to improve computational efficiency. It formulates the problem as a quantum Markov decision process (q-MDP) and solves it with parameterized quantum circuits (PQC), employing a discrete quantum DQN (Q-DQN) for day-ahead decisions and a quantum SAC (Q-SAC) for real-time control, with state encoding via density operators and transitions via quantum channels. The case study on a modified IEEE RTS 24-bus system demonstrates that QRL achieves faster convergence, lower constraint violations, and better runtime than DRL and traditional UC methods, validating the approach’s potential for robust, scalable power-system optimization. The results underscore the practical impact of quantum-enabled RL in enhancing reliability and responsiveness of power grids facing increasing renewable penetration and uncertainty, while leveraging VPPs to bolster system ramping and balancing capabilities.

Abstract

Unit commitment (UC) optimizes the start-up and shutdown schedules of generating units to meet load demand while minimizing costs. However, the increasing integration of renewable energy introduces uncertainties for real-time scheduling. Existing solutions face limitations both in modeling and algorithmic design. At the modeling level, they fail to incorporate widely adopted virtual power plants (VPPs) as flexibility resources, missing the opportunity to proactively mitigate potential real-time imbalances or ramping constraints through foresight-seeing decision-making. At the algorithmic level, existing probabilistic optimization, multi-stage approaches, and machine learning, face challenges in computational complexity and adaptability. To address these challenges, this study proposes a novel two-stage UC framework that incorporates foresight-seeing sequential decision-making in both day-ahead and real-time scheduling, leveraging VPPs as flexibility resources to proactively reserve capacity and ramping flexibility for upcoming renewable energy uncertainties over several hours. In particular, we develop quantum reinforcement learning (QRL) algorithms that integrate the foresight-seeing sequential decision-making and scalable computation advantages of deep reinforcement learning (DRL) with the parallel and high-efficiency search capabilities of quantum computing. Experimental results demonstrate that the proposed QRL-based approach outperforms in computational efficiency, real-time responsiveness, and solution quality.

Quantum Reinforcement Learning-Based Two-Stage Unit Commitment Framework for Enhanced Power Systems Robustness

TL;DR

This work addresses robust unit commitment (UC) under high renewable uncertainty by proposing a two-stage foresight-seeing UC framework that uses virtual power plants (VPPs) as flexible resources and leverages quantum reinforcement learning (QRL) to improve computational efficiency. It formulates the problem as a quantum Markov decision process (q-MDP) and solves it with parameterized quantum circuits (PQC), employing a discrete quantum DQN (Q-DQN) for day-ahead decisions and a quantum SAC (Q-SAC) for real-time control, with state encoding via density operators and transitions via quantum channels. The case study on a modified IEEE RTS 24-bus system demonstrates that QRL achieves faster convergence, lower constraint violations, and better runtime than DRL and traditional UC methods, validating the approach’s potential for robust, scalable power-system optimization. The results underscore the practical impact of quantum-enabled RL in enhancing reliability and responsiveness of power grids facing increasing renewable penetration and uncertainty, while leveraging VPPs to bolster system ramping and balancing capabilities.

Abstract

Unit commitment (UC) optimizes the start-up and shutdown schedules of generating units to meet load demand while minimizing costs. However, the increasing integration of renewable energy introduces uncertainties for real-time scheduling. Existing solutions face limitations both in modeling and algorithmic design. At the modeling level, they fail to incorporate widely adopted virtual power plants (VPPs) as flexibility resources, missing the opportunity to proactively mitigate potential real-time imbalances or ramping constraints through foresight-seeing decision-making. At the algorithmic level, existing probabilistic optimization, multi-stage approaches, and machine learning, face challenges in computational complexity and adaptability. To address these challenges, this study proposes a novel two-stage UC framework that incorporates foresight-seeing sequential decision-making in both day-ahead and real-time scheduling, leveraging VPPs as flexibility resources to proactively reserve capacity and ramping flexibility for upcoming renewable energy uncertainties over several hours. In particular, we develop quantum reinforcement learning (QRL) algorithms that integrate the foresight-seeing sequential decision-making and scalable computation advantages of deep reinforcement learning (DRL) with the parallel and high-efficiency search capabilities of quantum computing. Experimental results demonstrate that the proposed QRL-based approach outperforms in computational efficiency, real-time responsiveness, and solution quality.

Paper Structure

This paper contains 18 sections, 28 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Proposed two-stage robust UC Framework. A) The quantum-based RL approach encodes observational data into a quantum circuit for optimization. Using a diagrammatic representation of the circuit, the agent selects circuit transformations to generate control actions, which are then decoded and applied to the environment. This process is repeated iteratively. B) All possible single-qubit gates are represented by hexagons, while two-qubit gates are denoted by yellow rectangles at the top. The unitary $U_x$, shown below, corresponds to the data encoding layer. The quantum circuit iteratively optimizes trainable parameters for all candidate qubit gates based on a gradient-based classical optimizer, transforming the original circuit into a more efficient configuration.
  • Figure 2: Quantum Circuit Architecture for Reinforcement Learning.
  • Figure 3: Structure of modified IEEE RTS 24-bus system.
  • Figure 4: Power generation stack of four different optimization approaches.
  • Figure 5: Comparison results of operational cost between QRL and DRL approaches.
  • ...and 3 more figures