Reactor Optimization Benchmark by Reinforcement Learning

Deborah Schwarcz; Nadav Schneider; Gal Oren; Uri Steinitz

Reactor Optimization Benchmark by Reinforcement Learning

Deborah Schwarcz, Nadav Schneider, Gal Oren, Uri Steinitz

TL;DR

A novel benchmark problem within the OpenNeoMC framework designed specifically for reinforcement learning involves optimizing a unit cell of a research reactor with two varying parameters to maximize neutron flux while maintaining reactor criticality.

Abstract

Neutronic calculations for reactors are a daunting task when using Monte Carlo (MC) methods. As high-performance computing has advanced, the simulation of a reactor is nowadays more readily done, but design and optimization with multiple parameters is still a computational challenge. MC transport simulations, coupled with machine learning techniques, offer promising avenues for enhancing the efficiency and effectiveness of nuclear reactor optimization. This paper introduces a novel benchmark problem within the OpenNeoMC framework designed specifically for reinforcement learning. The benchmark involves optimizing a unit cell of a research reactor with two varying parameters (fuel density and water spacing) to maximize neutron flux while maintaining reactor criticality. The test case features distinct local optima, representing different physical regimes, thus posing a challenge for learning algorithms. Through extensive simulations utilizing evolutionary and neuroevolutionary algorithms, we demonstrate the effectiveness of reinforcement learning in navigating complex optimization landscapes with strict constraints. Furthermore, we propose acceleration techniques within the OpenNeoMC framework, including model updating and cross-section usage by RAM utilization, to expedite simulation times. Our findings emphasize the importance of machine learning integration in reactor optimization and contribute to advancing methodologies for addressing intricate optimization challenges in nuclear engineering. The sources of this work are available at our GitHub repository: https://github.com/Scientific-Computing-Lab-NRCN/RLOpenNeoMC

Reactor Optimization Benchmark by Reinforcement Learning

TL;DR

Abstract

Paper Structure (10 sections, 1 equation, 5 figures, 2 tables)

This paper contains 10 sections, 1 equation, 5 figures, 2 tables.

Optimization Methodology
OpenNeoMC Framework and its Speed-up
Evolutionary Algorithms and JAYA
Neuroevolution Algorithms and PPO-ES
Benchmarks
The MTR Reactor RL-Optimization Benchmark
Physical Description
Objective Function
Results
Summary and Conclusions

Figures (5)

Figure 1: A description of OpenNeoMC's flow, presented in clockwise order: First, OpenMC is invoked with initial parameters within the permissible range of the model. The OpenMC results are received by the objective function which computes the fitness value. The fitness value is then passed through the NeoRL framework, which updates the physical parameters through the chosen algorithm that OpenMC recalculates as the process is repeated.
Figure 2: RL trains an agent to react through an environment with rewards given by a chosen objective function. GA and its derivatives are based on an initial population that transforms over generations through natural operators and fitness value given by an objective function until satisfied fitness is achieved. The optimal solution will be the fittest individual.
Figure 3: Top view of the system: the colors pink, yellow, blue, and green stand for enriched uranium, aluminum, water, and cadmium, respectively. The numbers are expressed in centimeters, the z-axis is limitless, and all sides have reflective boundary conditions.
Figure 4: The interpolated fitness map as sampled by the PPO-ES (left) and JAYA (right) algorithms, as a function of the uranium and water density ($U$ and $W$, respectively). The black crosses indicate sampled points, whereas red circles indicate criticality (sample points where $k~1$). JAYA sampling reveals only a subspace of the parameter domain, while PPO-ES maps the full permissible extent.
Figure 5: Fast flux in dependence of water and uranium density, the red circles at the base plane indicate areas where criticality is met ($k~1$). The fast flux changes dramatically with water density due to the neutron moderation. The two critical areas are disconnected, preventing genetic algorithms to hop from one to another. Therefore, the JAYA algorithm optimized for high water density, even though the flux there is orders of magnitude lower than at the other critical area.

Reactor Optimization Benchmark by Reinforcement Learning

TL;DR

Abstract

Reactor Optimization Benchmark by Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)