Table of Contents
Fetching ...

Template-Free Gravitational Wave Detection with CWT-LSTM Autoencoders: A Case Study of Run-Dependent Calibration Effects in LIGO Data

Jericho Cain

TL;DR

This work presents a template-free approach to gravitational wave detection by combining Continuous Wavelet Transform (CWT) preprocessing with a Long Short-Term Memory (LSTM) autoencoder. The method learns detector-noise characteristics from unlabeled data and flags deviations as potential signals, achieving 97.0% precision and 96.1% recall on LIGO O4 data, with an AUC of 0.994. A key finding is the discovery and resolution of cross-run batch effects in GWOSC data, where multi-run training produced run-dependent reconstruction errors; restricting to a single, well-calibrated run (O4) eliminated these biases and yielded robust performance. The study demonstrates that template-free anomaly detection can rival supervised approaches while preserving discovery potential for signals with unexpected morphologies, and provides methodological guidance for handling multi-epoch astrophysical datasets in ML pipelines.

Abstract

Gravitational wave detection requires sophisticated signal processing to identify weak astrophysical signals buried in instrumental noise. Traditional matched filtering approaches face computational challenges with diverse signal morphologies and non-stationary noise. This work presents an unsupervised deep learning methodology integrating Continuous Wavelet Transform (CWT) preprocessing with Long Short-Term Memory (LSTM) autoencoder architecture for template-free gravitational wave detection. We train and evaluate our model on LIGO H1 data from Observing Run 4 (O4, 2023-2024), comprising 126 confirmed gravitational wave events from the GWTC-4.0 catalog and 1991 noise segments. During development, we discovered that reconstruction errors from multi-run training (O1-O4) clustered by observing run rather than astrophysical parameters, revealing systematic batch effects from GWOSC's evolving calibration procedures. Following LIGO's established practice of per-run optimization, we adopted single-run (O4) training, which eliminated these batch effects and improved recall from 52% to 96% while maintaining 97% precision. The final model achieves 97.0% precision, 96.1% recall, F1-score 96.6%, and ROC-AUC 0.994 on 102 test signals and 399 noise segments. The reconstruction error distribution shows clean separation between noise (mean 0.48) and signals (mean 0.77). This unsupervised, template-free approach demonstrates that anomaly detection can achieve performance competitive with supervised methods while enabling discovery of signals with unexpected morphologies. Our identification and resolution of cross-run batch effects provides methodological guidance for future machine learning applications to multi-epoch gravitational wave datasets.

Template-Free Gravitational Wave Detection with CWT-LSTM Autoencoders: A Case Study of Run-Dependent Calibration Effects in LIGO Data

TL;DR

This work presents a template-free approach to gravitational wave detection by combining Continuous Wavelet Transform (CWT) preprocessing with a Long Short-Term Memory (LSTM) autoencoder. The method learns detector-noise characteristics from unlabeled data and flags deviations as potential signals, achieving 97.0% precision and 96.1% recall on LIGO O4 data, with an AUC of 0.994. A key finding is the discovery and resolution of cross-run batch effects in GWOSC data, where multi-run training produced run-dependent reconstruction errors; restricting to a single, well-calibrated run (O4) eliminated these biases and yielded robust performance. The study demonstrates that template-free anomaly detection can rival supervised approaches while preserving discovery potential for signals with unexpected morphologies, and provides methodological guidance for handling multi-epoch astrophysical datasets in ML pipelines.

Abstract

Gravitational wave detection requires sophisticated signal processing to identify weak astrophysical signals buried in instrumental noise. Traditional matched filtering approaches face computational challenges with diverse signal morphologies and non-stationary noise. This work presents an unsupervised deep learning methodology integrating Continuous Wavelet Transform (CWT) preprocessing with Long Short-Term Memory (LSTM) autoencoder architecture for template-free gravitational wave detection. We train and evaluate our model on LIGO H1 data from Observing Run 4 (O4, 2023-2024), comprising 126 confirmed gravitational wave events from the GWTC-4.0 catalog and 1991 noise segments. During development, we discovered that reconstruction errors from multi-run training (O1-O4) clustered by observing run rather than astrophysical parameters, revealing systematic batch effects from GWOSC's evolving calibration procedures. Following LIGO's established practice of per-run optimization, we adopted single-run (O4) training, which eliminated these batch effects and improved recall from 52% to 96% while maintaining 97% precision. The final model achieves 97.0% precision, 96.1% recall, F1-score 96.6%, and ROC-AUC 0.994 on 102 test signals and 399 noise segments. The reconstruction error distribution shows clean separation between noise (mean 0.48) and signals (mean 0.77). This unsupervised, template-free approach demonstrates that anomaly detection can achieve performance competitive with supervised methods while enabling discovery of signals with unexpected morphologies. Our identification and resolution of cross-run batch effects provides methodological guidance for future machine learning applications to multi-epoch gravitational wave datasets.

Paper Structure

This paper contains 15 sections, 14 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Time domain to CWT domain transformation of gravitational wave signal. Left panel shows the gravitational wave strain signal in the time domain, displaying the characteristic chirp pattern where amplitude and frequency increase as the binary black holes spiral inward. Right panel shows the same signal transformed using Continuous Wavelet Transform (CWT), revealing the time-frequency evolution of the gravitational wave. The bright diagonal band in the spectrogram clearly shows the frequency sweep from low ($\sim$100 Hz) to high ($\sim$800 Hz) frequencies over approximately 0.75--1.3 seconds, corresponding to the inspiral phase of the binary black hole merger.
  • Figure 2: Frequency band comparison between noise and gravitational wave signals in CWT space. Top row shows spectrograms of clean LIGO noise data in three frequency bands: 20--50 Hz, 50--100 Hz, and 100--200 Hz. Bottom row shows the same frequency bands for gravitational wave data containing the GW150914 signal. The gravitational wave plots clearly display the characteristic chirp pattern (bright diagonal bands) sweeping upward in frequency over time, which is absent in the corresponding noise plots. Notably, the higher frequency bands (50--100 Hz and 100--200 Hz) show a strong vertical feature around 2.3--2.4 seconds, corresponding to the merger and ringdown phases of the binary black hole coalescence. The background noise patterns (vertical stripes and horizontal bands) are visible in both noise and gravitational wave data, demonstrating that the signal is embedded within the ambient detector noise.
  • Figure 3: CWT spectrogram of GW150914, the first detected gravitational wave event. Top panel: Full 4-second window showing the complete signal evolution embedded in detector noise. Bottom panel: Focused $\pm$250 ms view centered on the merger time (GPS 1126259462.4), revealing fine temporal structure of the inspiral-merger-ringdown phases. The characteristic chirp sweeps from $\sim$35 Hz to $\sim$250 Hz over approximately 0.2 seconds, with frequency increasing rapidly as the binary black holes spiral inward. The bright vertical band at $\sim$15.2 seconds marks the merger event, followed by the ringdown phase visible as a brief high-frequency tail. This dual-scale visualization demonstrates the CWT's ability to preserve both coarse temporal context (top) and fine-grained merger dynamics (bottom) essential for anomaly detection. The 256-scale CWT decomposition captures the full frequency evolution while maintaining sufficient time resolution to identify the sub-second transient characteristic of binary coalescences.
  • Figure 4: Schematic of the CWT-LSTM autoencoder architecture. The input CWT representation (8 scales × 4096 time points) is processed through an LSTM encoder to produce a compressed latent representation, which is then reconstructed through an LSTM decoder. The reconstruction error serves as the anomaly score for gravitational wave detection.
  • Figure 5: Evaluation results for CWT-LSTM autoencoder on O4 LIGO data. (a) Precision-Recall curve showing AP=0.967. (b) ROC curve showing AUC=0.994. (c) Confusion matrix at optimal threshold (0.667) with 98 true positives, 4 false negatives, 3 false positives, and 396 true negatives, yielding 97.0% precision and 96.1% recall. (d) Reconstruction error distribution showing clean separation between noise (blue, mean 0.48) and gravitational wave signals (red, mean 0.77), with unimodal distributions within each class confirming elimination of cross-run batch effects. Test set: 102 O4 signals, 399 noise segments.
  • ...and 1 more figures