Table of Contents
Fetching ...

Multivariate Time Series Clustering for Environmental State Characterization of Ground-Based Gravitational-Wave Detectors

Rutuja Gurav, Isaac Kelly, Pooyan Goodarzi, Anamaria Effler, Barry Barish, Evangelos Papalexakis, Jonathan Richardson

TL;DR

This work addresses the challenge of monitoring terrestrial environmental disturbances that affect ground-based gravitational-wave detectors by automatically characterizing the detector's environmental state. It introduces a lightweight, near real-time pipeline that extracts BLRMS-based features from multiple seismometer channels, clusters one-hour windows using $k$-means into $k$ states, and labels clusters with interpretable Earthquake, Microseism, and Anthropogenic categories. The approach is evaluated by correlating discovered states with documented earthquakes and with detector glitches or loss of lock, showing that some states align with problematic periods and offering actionable diagnostics for commissioning and operation. The contributions include an end-to-end pipeline, a public dataset, and demonstrated potential for deployment to improve up-time and data quality in gravitational-wave astronomy.

Abstract

Gravitational-wave observatories like LIGO are large-scale, terrestrial instruments housed in infrastructure that spans a multi-kilometer geographic area and which must be actively controlled to maintain operational stability for long observation periods. Despite exquisite seismic isolation, they remain susceptible to seismic noise and other terrestrial disturbances that can couple undesirable vibrations into the instrumental infrastructure, potentially leading to control instabilities or noise artifacts in the detector output. It is, therefore, critical to characterize the seismic state of these observatories to identify a set of temporal patterns that can inform the detector operators in day-to-day monitoring and diagnostics. On a day-to-day basis, the operators monitor several seismically relevant data streams to diagnose operational instabilities and sources of noise using some simple empirically-determined thresholds. It can be untenable for a human operator to monitor multiple data streams in this manual fashion and thus a distillation of these data-streams into a more human-friendly format is sought. In this paper, we present an end-to-end machine learning pipeline for features-based multivariate time series clustering to achieve this goal and to provide actionable insights to the detector operators by correlating found clusters with events of interest in the detector.

Multivariate Time Series Clustering for Environmental State Characterization of Ground-Based Gravitational-Wave Detectors

TL;DR

This work addresses the challenge of monitoring terrestrial environmental disturbances that affect ground-based gravitational-wave detectors by automatically characterizing the detector's environmental state. It introduces a lightweight, near real-time pipeline that extracts BLRMS-based features from multiple seismometer channels, clusters one-hour windows using -means into states, and labels clusters with interpretable Earthquake, Microseism, and Anthropogenic categories. The approach is evaluated by correlating discovered states with documented earthquakes and with detector glitches or loss of lock, showing that some states align with problematic periods and offering actionable diagnostics for commissioning and operation. The contributions include an end-to-end pipeline, a public dataset, and demonstrated potential for deployment to improve up-time and data quality in gravitational-wave astronomy.

Abstract

Gravitational-wave observatories like LIGO are large-scale, terrestrial instruments housed in infrastructure that spans a multi-kilometer geographic area and which must be actively controlled to maintain operational stability for long observation periods. Despite exquisite seismic isolation, they remain susceptible to seismic noise and other terrestrial disturbances that can couple undesirable vibrations into the instrumental infrastructure, potentially leading to control instabilities or noise artifacts in the detector output. It is, therefore, critical to characterize the seismic state of these observatories to identify a set of temporal patterns that can inform the detector operators in day-to-day monitoring and diagnostics. On a day-to-day basis, the operators monitor several seismically relevant data streams to diagnose operational instabilities and sources of noise using some simple empirically-determined thresholds. It can be untenable for a human operator to monitor multiple data streams in this manual fashion and thus a distillation of these data-streams into a more human-friendly format is sought. In this paper, we present an end-to-end machine learning pipeline for features-based multivariate time series clustering to achieve this goal and to provide actionable insights to the detector operators by correlating found clusters with events of interest in the detector.

Paper Structure

This paper contains 14 sections, 6 figures.

Figures (6)

  • Figure 1: A one-week band-limited sample of root-mean-square (RMS) ground motion data recorded by seismometers at the LIGO Livingston site. Each frequency band is associated with a set of different physical causes. (a) Micro-seismic frequency band (0.1-0.3 Hz) is mostly sensitive to ground motion caused by oceanic waves and has a characteristic time scale of multiple days giaime2003feedforward. (b) Low-frequency anthropogenic band (0.3-1 Hz) is correlated to various human related activities and the tides. (c) Earthquake band (0.03-0.1 Hz) captures ground motions mostly due to earthquakes and wind. (d) The 10-30 Hz frequency band is sensitive to ground motion due to mechanical vibrations of equipment at the LIGO sites, such as the HVAC system Nguyen:2021.
  • Figure 2: Top: Example time series data from one of the numerous seismometers deployed across the LIGO sites. Bottom: The same signal bandpass-filtered to select three physically-motivated frequency bands corresponding to known seismic phenomenon.
  • Figure 3: Step-by-step workflow of our proposed pipeline. Each step is described in detail in the text of §\ref{['subsec:modeling']}.
  • Figure 4: Example of our end-to-end analysis. Top panel consists of a set of sample channels that we are running the model on. The middle panel is the output of the clustering model. Each line represents a different state and the green flags are the segments at which the detector is in that specific state. The bottom panel shows the states assigned by the field experts using a simple threshold. It can be seen that the model's output recovers the expert's expectations without supervision. For example, there are clusters that correspond to earthquakes, high microseism, and high anthropogenic noise.
  • Figure 5: Cluster validation indices schubert2023stop employed in order to identify a short range of admissible number of cluster values that the operator can iterate over. A grid search for [3-20] number of clusters was done and three standard clustering validation scores where calculated. The final number of clusters was determined using the correlation with the Glitch rates as an external validation metric.
  • ...and 1 more figures