Hydra: Computer Vision for Data Quality Monitoring
Thomas Britton, Torri Jeske, David Lawrence, Kishansingh Rajput
TL;DR
Hydra addresses the burden of manual data quality monitoring by delivering near real-time, computer vision–based QC across Jefferson Lab halls. It combines a Python back-end (with TensorFlow models and a MySQL database) and a web front-end to manage, visualize, and interact with model inferences, labeling, and historical data. Key contributions include a modular inference pipeline with ZeroMQ-based messaging, gradCAM explanations for Bad classifications, hall-agnostic deployment, and an integrated labeling and evaluation workflow. The system enables faster issue detection and decision-making, with ongoing developments aimed at expanding detection capabilities, improving human control, and enhancing computational efficiency.
Abstract
Hydra is a system which utilizes computer vision to perform near real time data quality management, initially developed for Hall-D in 2019. Since then, it has been deployed across all experimental halls at Jefferson Lab, with the CLAS12 collaboration in Hall-B being the first outside of GlueX to fully utilize Hydra. The system comprises back end processes that manage the models, their inferences, and the data flow. The front-end components, accessible via web pages, allow detector experts and shift crews to view and interact with the system. This talk will give an overview of the Hydra system as well as highlight significant developments in Hydra's feature set, acute challenges with operating Hydra in all halls, and lessons learned along the way.
