Constrained Optimization of Charged Particle Tracking with Multi-Agent Reinforcement Learning

Tobias Kortus; Ralf Keidel; Nicolas R. Gauger; Jan Kieseler

Constrained Optimization of Charged Particle Tracking with Multi-Agent Reinforcement Learning

Tobias Kortus, Ralf Keidel, Nicolas R. Gauger, Jan Kieseler

TL;DR

This work tackles charged particle track reconstruction in pixel detectors by framing the problem as a constrained multi-agent reinforcement learning task. It introduces a differentiable safety layer that solves a linear sum assignment to enforce unique hit-to-track assignments and augments the training signal with a cost-margin gradient to promote lower-tracking costs. Evaluated on simulated Bergen pCT detector data, the constrained MARL with cost margins outperforms baselines and reduces predictive instabilities, especially at higher particle densities, while maintaining competitive performance with post-hoc centralized single-agent models. The results highlight the value of assignment-constrained MARL for robust, flexible tracking and suggest avenues for adapting the framework to different detectors and more complex reward structures.

Abstract

Reinforcement learning demonstrated immense success in modelling complex physics-driven systems, providing end-to-end trainable solutions by interacting with a simulated or real environment, maximizing a scalar reward signal. In this work, we propose, building upon previous work, a multi-agent reinforcement learning approach with assignment constraints for reconstructing particle tracks in pixelated particle detectors. Our approach optimizes collaboratively a parametrized policy, functioning as a heuristic to a multidimensional assignment problem, by jointly minimizing the total amount of particle scattering over the reconstructed tracks in a readout frame. To satisfy constraints, guaranteeing a unique assignment of particle hits, we propose a safety layer solving a linear assignment problem for every joint action. Further, to enforce cost margins, increasing the distance of the local policies predictions to the decision boundaries of the optimizer mappings, we recommend the use of an additional component in the blackbox gradient estimation, forcing the policy to solutions with lower total assignment costs. We empirically show on simulated data, generated for a particle detector developed for proton imaging, the effectiveness of our approach, compared to multiple single- and multi-agent baselines. We further demonstrate the effectiveness of constraints with cost margins for both optimization and generalization, introduced by wider regions with high reconstruction performance as well as reduced predictive instabilities. Our results form the basis for further developments in RL-based tracking, offering both enhanced performance with constrained policies and greater flexibility in optimizing tracking algorithms through the option for individual and team rewards.

Constrained Optimization of Charged Particle Tracking with Multi-Agent Reinforcement Learning

TL;DR

Abstract

Paper Structure (35 sections, 19 equations, 9 figures, 3 tables)

This paper contains 35 sections, 19 equations, 9 figures, 3 tables.

Introduction
Theory and Background
Bergen pCT detector prototype
Particle interactions and tracking
Related Work
Particle tracking
Safe/Constrained Reinforcement Learning
Methodology
Problem Statement
Graph construction
Sampling of track candidates
Objective
Architecture and Implementation
Feature preparation
Local agent policies
...and 20 more sections

Figures (9)

Figure 1: General description of charged particle tracking framework for single- or multi-agent reinforcement learning. The agent (right) learns by iterated interaction with the environment, represented as a directed acyclic graph (left), reconstruction policies that maximize the obtained rewards. Agent components marked with dashed lines are optional and are only used for some agent configurations.
Figure 2: Interaction loop between environment description containing particle readouts in the form of a directed acyclic graph based on Kortus2023. The agent (network architecture on the right) observes a state, describing the current particle trajectory, and chooses a next particle hit in the subsequent layer. The reward is defined based on the physical likelihood of the undertaken transition.
Figure 3: Average return obtained by the agents over time during training, plotted as a function of performed updates (for MAPPO: iteration over all epochs are counted as a single update) and sampled track transitions.
Figure 4: Particle tracks generated using MATD3+LSA (BB$^\leftrightarrow_{\nu=0.01}$) for simulated particle tracks with 100mm water phantom and 200$p^+/F$
Figure 5: Distributions of the uncertainties in local policy predictions, measured as the predictive entropy for various water phantoms and particle densities. Techniques with enforced cost margins demonstrate significantly reduced uncertainties.
...and 4 more figures

Constrained Optimization of Charged Particle Tracking with Multi-Agent Reinforcement Learning

TL;DR

Abstract

Constrained Optimization of Charged Particle Tracking with Multi-Agent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (9)