Table of Contents
Fetching ...

Nuclear Microreactor Control with Deep Reinforcement Learning

Leo Tunkle, Kamal Abdulraheem, Linyu Lin, Majdi I. Radaideh

TL;DR

This paper addresses autonomous load-following control for Holos-Quad nuclear microreactors using deep reinforcement learning. It compares single-agent PPO-based control against PID and introduces a multi-action MARL framework that exploits reactor symmetry to enable decentralized, symmetric drum control. Key findings show RL can match or exceed PID performance in short transients and under measurement noise, with MARL delivering symmetry and efficient training, while generalizing to longer transients with minimal retraining. The work suggests RL can reduce staffing and training costs while maintaining safety, though validation in higher-fidelity simulations and experiments remains essential for practical deployment.

Abstract

The economic feasibility of nuclear microreactors will depend on minimizing operating costs through advancements in autonomous control, especially when these microreactors are operating alongside other types of energy systems (e.g., renewable energy). This study explores the application of deep reinforcement learning (RL) for real-time drum control in microreactors, exploring performance in regard to load-following scenarios. By leveraging a point kinetics model with thermal and xenon feedback, we first establish a baseline using a single-output RL agent, then compare it against a traditional proportional-integral-derivative (PID) controller. This study demonstrates that RL controllers, including both single- and multi-agent RL (MARL) frameworks, can achieve similar or even superior load-following performance as traditional PID control across a range of load-following scenarios. In short transients, the RL agent was able to reduce the tracking error rate in comparison to PID. Over extended 300-minute load-following scenarios in which xenon feedback becomes a dominant factor, PID maintained better accuracy, but RL still remained within a 1% error margin despite being trained only on short-duration scenarios. This highlights RL's strong ability to generalize and extrapolate to longer, more complex transients, affording substantial reductions in training costs and reduced overfitting. Furthermore, when control was extended to multiple drums, MARL enabled independent drum control as well as maintained reactor symmetry constraints without sacrificing performance -- an objective that standard single-agent RL could not learn. We also found that, as increasing levels of Gaussian noise were added to the power measurements, the RL controllers were able to maintain lower error rates than PID, and to do so with less control effort.

Nuclear Microreactor Control with Deep Reinforcement Learning

TL;DR

This paper addresses autonomous load-following control for Holos-Quad nuclear microreactors using deep reinforcement learning. It compares single-agent PPO-based control against PID and introduces a multi-action MARL framework that exploits reactor symmetry to enable decentralized, symmetric drum control. Key findings show RL can match or exceed PID performance in short transients and under measurement noise, with MARL delivering symmetry and efficient training, while generalizing to longer transients with minimal retraining. The work suggests RL can reduce staffing and training costs while maintaining safety, though validation in higher-fidelity simulations and experiments remains essential for practical deployment.

Abstract

The economic feasibility of nuclear microreactors will depend on minimizing operating costs through advancements in autonomous control, especially when these microreactors are operating alongside other types of energy systems (e.g., renewable energy). This study explores the application of deep reinforcement learning (RL) for real-time drum control in microreactors, exploring performance in regard to load-following scenarios. By leveraging a point kinetics model with thermal and xenon feedback, we first establish a baseline using a single-output RL agent, then compare it against a traditional proportional-integral-derivative (PID) controller. This study demonstrates that RL controllers, including both single- and multi-agent RL (MARL) frameworks, can achieve similar or even superior load-following performance as traditional PID control across a range of load-following scenarios. In short transients, the RL agent was able to reduce the tracking error rate in comparison to PID. Over extended 300-minute load-following scenarios in which xenon feedback becomes a dominant factor, PID maintained better accuracy, but RL still remained within a 1% error margin despite being trained only on short-duration scenarios. This highlights RL's strong ability to generalize and extrapolate to longer, more complex transients, affording substantial reductions in training costs and reduced overfitting. Furthermore, when control was extended to multiple drums, MARL enabled independent drum control as well as maintained reactor symmetry constraints without sacrificing performance -- an objective that standard single-agent RL could not learn. We also found that, as increasing levels of Gaussian noise were added to the power measurements, the RL controllers were able to maintain lower error rates than PID, and to do so with less control effort.

Paper Structure

This paper contains 19 sections, 19 equations, 11 figures, 2 tables, 1 algorithm.

Figures (11)

  • Figure 1: Axial slice of the Holos-Quad design with control drums out (left) and control drums inserted (right).
  • Figure 2: RL training loop, where $P^*$ is the target reactor power, $P$ is the simulated reactor power, and $d\theta$ is the control drum rotation speed.
  • Figure 3: MARL training loop, where $p^*$ is the target reactor power, $p$ is the measured/simulated reactor power, and $d\theta^n$ is the control drum rotation speed of the $n$th drum.
  • Figure 4: Plots showing the power, error, temperature, control drum speed, and control drum angle over time, with the PID and single-RL controllers applied to the "test" profile. The training profile shows comparable performance and metrics, indicating no overfitting. Single-RL was trained to take a single action that is applied symmetrically to all drums.
  • Figure 5: Plots showing the power, error, and drum speed over time, with the PID and single-RL controllers applied to the "low-power" profile.
  • ...and 6 more figures