Table of Contents
Fetching ...

Human-in-the-loop Reinforcement Learning for Data Quality Monitoring in Particle Physics Experiments

Olivia Jullian Parra, Julián García Pardiñas, Lorenzo Del Pianta Pérez, Maximilian Janisch, Suzanne Klaver, Thomas Lehéricy, Nicola Serra

TL;DR

This work introduces a human-in-the-loop reinforcement learning framework for Data Quality Monitoring in particle physics, addressing non-stationary detector conditions and noisy human labels. It combines a multi-agent PPO setup (predictor and checker) with RLHF to automate online and offline DQM decisions, using a simplified synthetic dataset for proof-of-concept validation. Key contributions include a dual-reward structure guiding both predictor and checker, a data-augmentation strategy to improve sample efficiency, and demonstrations that the approach adapts to changing conditions and reduces reliance on continuous human input. The results suggest practical potential for reducing human workload in detector control rooms while maintaining or exceeding baseline accuracy, with clear pathways toward real-world deployment and further study of human–machine collaboration dynamics.

Abstract

Data Quality Monitoring (DQM) is a crucial task in large particle physics experiments, since detector malfunctioning can compromise the data. DQM is currently performed by human shifters, which is costly and results in limited accuracy. In this work, we provide a proof-of-concept for applying human-in-the-loop Reinforcement Learning (RL) to automate the DQM process while adapting to operating conditions that change over time. We implement a prototype based on the Proximal Policy Optimization (PPO) algorithm and validate it on a simplified synthetic dataset. We demonstrate how a multi-agent system can be trained for continuous automated monitoring during data collection, with human intervention actively requested only when relevant. We show that random, unbiased noise in human classification can be reduced, leading to an improved accuracy over the baseline. Additionally, we propose data augmentation techniques to deal with scarce data and to accelerate the learning process. Finally, we discuss further steps needed to implement the approach in the real world, including protocols for periodic control of the algorithm's outputs.

Human-in-the-loop Reinforcement Learning for Data Quality Monitoring in Particle Physics Experiments

TL;DR

This work introduces a human-in-the-loop reinforcement learning framework for Data Quality Monitoring in particle physics, addressing non-stationary detector conditions and noisy human labels. It combines a multi-agent PPO setup (predictor and checker) with RLHF to automate online and offline DQM decisions, using a simplified synthetic dataset for proof-of-concept validation. Key contributions include a dual-reward structure guiding both predictor and checker, a data-augmentation strategy to improve sample efficiency, and demonstrations that the approach adapts to changing conditions and reduces reliance on continuous human input. The results suggest practical potential for reducing human workload in detector control rooms while maintaining or exceeding baseline accuracy, with clear pathways toward real-world deployment and further study of human–machine collaboration dynamics.

Abstract

Data Quality Monitoring (DQM) is a crucial task in large particle physics experiments, since detector malfunctioning can compromise the data. DQM is currently performed by human shifters, which is costly and results in limited accuracy. In this work, we provide a proof-of-concept for applying human-in-the-loop Reinforcement Learning (RL) to automate the DQM process while adapting to operating conditions that change over time. We implement a prototype based on the Proximal Policy Optimization (PPO) algorithm and validate it on a simplified synthetic dataset. We demonstrate how a multi-agent system can be trained for continuous automated monitoring during data collection, with human intervention actively requested only when relevant. We show that random, unbiased noise in human classification can be reduced, leading to an improved accuracy over the baseline. Additionally, we propose data augmentation techniques to deal with scarce data and to accelerate the learning process. Finally, we discuss further steps needed to implement the approach in the real world, including protocols for periodic control of the algorithm's outputs.
Paper Structure (34 sections, 1 theorem, 15 equations, 5 figures, 5 tables, 2 algorithms)

This paper contains 34 sections, 1 theorem, 15 equations, 5 figures, 5 tables, 2 algorithms.

Key Result

Lemma 1

Let $(p^\theta)_{\theta\in\Theta}$ be a family of probability densities with respect to a $\sigma$-finite reference measure $\mu$ on some measurable space $\Omega$ indexed by some open set $\Theta\subset\mathbb R^n, n\in\mathbb N$, and denote by $\mathbb P^\theta$ the corresponding probability measu where $\mathbb E^{\mathbb P^\theta}$ denotes the expectation with respect to $\mathbb P^\theta$.

Figures (5)

  • Figure 1: Results of the experiment in which the capacity to adapt to changing conditions is tested.
  • Figure 2: Results of the experiments in the offline regime with noisy shifter labels.
  • Figure 3: Accuracy (computed on the non-augmented dataset) of the algorithm trained on the dataset augmented with our approach, resp. not augmented.
  • Figure 4: Results of the experiment in the online regime.
  • Figure 5: Function used to model the shifter's probability to trust the algorithm.

Theorems & Definitions (5)

  • Remark 1: Formal Definition of the $A_t, R_t, S_{t+1}$
  • Definition 1: State value function
  • Remark 2: Summing over time
  • Lemma 1: Log-derivative trick
  • proof