Multivariate Time Series Clustering for Environmental State Characterization of Ground-Based Gravitational-Wave Detectors
Rutuja Gurav, Isaac Kelly, Pooyan Goodarzi, Anamaria Effler, Barry Barish, Evangelos Papalexakis, Jonathan Richardson
TL;DR
This work addresses the challenge of monitoring terrestrial environmental disturbances that affect ground-based gravitational-wave detectors by automatically characterizing the detector's environmental state. It introduces a lightweight, near real-time pipeline that extracts BLRMS-based features from multiple seismometer channels, clusters one-hour windows using $k$-means into $k$ states, and labels clusters with interpretable Earthquake, Microseism, and Anthropogenic categories. The approach is evaluated by correlating discovered states with documented earthquakes and with detector glitches or loss of lock, showing that some states align with problematic periods and offering actionable diagnostics for commissioning and operation. The contributions include an end-to-end pipeline, a public dataset, and demonstrated potential for deployment to improve up-time and data quality in gravitational-wave astronomy.
Abstract
Gravitational-wave observatories like LIGO are large-scale, terrestrial instruments housed in infrastructure that spans a multi-kilometer geographic area and which must be actively controlled to maintain operational stability for long observation periods. Despite exquisite seismic isolation, they remain susceptible to seismic noise and other terrestrial disturbances that can couple undesirable vibrations into the instrumental infrastructure, potentially leading to control instabilities or noise artifacts in the detector output. It is, therefore, critical to characterize the seismic state of these observatories to identify a set of temporal patterns that can inform the detector operators in day-to-day monitoring and diagnostics. On a day-to-day basis, the operators monitor several seismically relevant data streams to diagnose operational instabilities and sources of noise using some simple empirically-determined thresholds. It can be untenable for a human operator to monitor multiple data streams in this manual fashion and thus a distillation of these data-streams into a more human-friendly format is sought. In this paper, we present an end-to-end machine learning pipeline for features-based multivariate time series clustering to achieve this goal and to provide actionable insights to the detector operators by correlating found clusters with events of interest in the detector.
