Human-in-the-loop Reinforcement Learning for Data Quality Monitoring in Particle Physics Experiments

Olivia Jullian Parra; Julián García Pardiñas; Lorenzo Del Pianta Pérez; Maximilian Janisch; Suzanne Klaver; Thomas Lehéricy; Nicola Serra

Human-in-the-loop Reinforcement Learning for Data Quality Monitoring in Particle Physics Experiments

Olivia Jullian Parra, Julián García Pardiñas, Lorenzo Del Pianta Pérez, Maximilian Janisch, Suzanne Klaver, Thomas Lehéricy, Nicola Serra

TL;DR

This work introduces a human-in-the-loop reinforcement learning framework for Data Quality Monitoring in particle physics, addressing non-stationary detector conditions and noisy human labels. It combines a multi-agent PPO setup (predictor and checker) with RLHF to automate online and offline DQM decisions, using a simplified synthetic dataset for proof-of-concept validation. Key contributions include a dual-reward structure guiding both predictor and checker, a data-augmentation strategy to improve sample efficiency, and demonstrations that the approach adapts to changing conditions and reduces reliance on continuous human input. The results suggest practical potential for reducing human workload in detector control rooms while maintaining or exceeding baseline accuracy, with clear pathways toward real-world deployment and further study of human–machine collaboration dynamics.

Abstract

Data Quality Monitoring (DQM) is a crucial task in large particle physics experiments, since detector malfunctioning can compromise the data. DQM is currently performed by human shifters, which is costly and results in limited accuracy. In this work, we provide a proof-of-concept for applying human-in-the-loop Reinforcement Learning (RL) to automate the DQM process while adapting to operating conditions that change over time. We implement a prototype based on the Proximal Policy Optimization (PPO) algorithm and validate it on a simplified synthetic dataset. We demonstrate how a multi-agent system can be trained for continuous automated monitoring during data collection, with human intervention actively requested only when relevant. We show that random, unbiased noise in human classification can be reduced, leading to an improved accuracy over the baseline. Additionally, we propose data augmentation techniques to deal with scarce data and to accelerate the learning process. Finally, we discuss further steps needed to implement the approach in the real world, including protocols for periodic control of the algorithm's outputs.

Human-in-the-loop Reinforcement Learning for Data Quality Monitoring in Particle Physics Experiments

TL;DR

Abstract

Paper Structure (34 sections, 1 theorem, 15 equations, 5 figures, 5 tables, 2 algorithms)

This paper contains 34 sections, 1 theorem, 15 equations, 5 figures, 5 tables, 2 algorithms.

Introduction
Related work
Experimental setup
Offline regime
Online regime
Computing and software resources
Reinforcement Learning algorithm
Environment
Agents and actions
Training episodes
Rewards
Predictor
Checker
Losses
Network update
...and 19 more sections

Key Result

Lemma 1

Let $(p^\theta)_{\theta\in\Theta}$ be a family of probability densities with respect to a $\sigma$-finite reference measure $\mu$ on some measurable space $\Omega$ indexed by some open set $\Theta\subset\mathbb R^n, n\in\mathbb N$, and denote by $\mathbb P^\theta$ the corresponding probability measu where $\mathbb E^{\mathbb P^\theta}$ denotes the expectation with respect to $\mathbb P^\theta$.

Figures (5)

Figure 1: Results of the experiment in which the capacity to adapt to changing conditions is tested.
Figure 2: Results of the experiments in the offline regime with noisy shifter labels.
Figure 3: Accuracy (computed on the non-augmented dataset) of the algorithm trained on the dataset augmented with our approach, resp. not augmented.
Figure 4: Results of the experiment in the online regime.
Figure 5: Function used to model the shifter's probability to trust the algorithm.

Theorems & Definitions (5)

Remark 1: Formal Definition of the $A_t, R_t, S_{t+1}$
Definition 1: State value function
Remark 2: Summing over time
Lemma 1: Log-derivative trick
proof

Human-in-the-loop Reinforcement Learning for Data Quality Monitoring in Particle Physics Experiments

TL;DR

Abstract

Human-in-the-loop Reinforcement Learning for Data Quality Monitoring in Particle Physics Experiments

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (5)