Table of Contents
Fetching ...

Cross-Temporal Spectrogram Autoencoder (CTSAE): Unsupervised Dimensionality Reduction for Clustering Gravitational Wave Glitches

Yi Li, Yunan Wu, Aggelos K. Katsaggelos

TL;DR

CTSAE presents an unsupervised, four-branch CNN–ViT autoencoder that processes four time-window spectrograms of LIGO glitches and uses a shared CLS token with a CLS Fusion Module to fuse cross-branch information, yielding a discriminative latent code $\hat{z}$ for clustering. Trained with per-branch reconstruction losses $L = \sum L_{mse}(I_i, \hat{I_i})$, the model demonstrates superior clustering performance (via NMI and ARI) on Gravity Spy O3 main-channel data relative to semi-supervised baselines, despite no ground-truth labels during training. The study shows that multi-branch architecture, CNN–ViT fusion, and CLS-based cross-branch communication are key to capturing both global and local glitch patterns across timescales, with reconstruction quality indicating faithful preservation of glitch structure. This unsupervised approach offers a scalable pathway for glitch identification in upcoming Gravity Spy 2.0 data across main and auxiliary channels, reducing dependence on manual labeling and enabling robust gravitational-wave detection pipelines.

Abstract

The advancement of The Laser Interferometer Gravitational-Wave Observatory (LIGO) has significantly enhanced the feasibility and reliability of gravitational wave detection. However, LIGO's high sensitivity makes it susceptible to transient noises known as glitches, which necessitate effective differentiation from real gravitational wave signals. Traditional approaches predominantly employ fully supervised or semi-supervised algorithms for the task of glitch classification and clustering. In the future task of identifying and classifying glitches across main and auxiliary channels, it is impractical to build a dataset with manually labeled ground-truth. In addition, the patterns of glitches can vary with time, generating new glitches without manual labels. In response to this challenge, we introduce the Cross-Temporal Spectrogram Autoencoder (CTSAE), a pioneering unsupervised method for the dimensionality reduction and clustering of gravitational wave glitches. CTSAE integrates a novel four-branch autoencoder with a hybrid of Convolutional Neural Networks (CNN) and Vision Transformers (ViT). To further extract features across multi-branches, we introduce a novel multi-branch fusion method using the CLS (Class) token. Our model, trained and evaluated on the GravitySpy O3 dataset on the main channel, demonstrates superior performance in clustering tasks when compared to state-of-the-art semi-supervised learning methods. To the best of our knowledge, CTSAE represents the first unsupervised approach tailored specifically for clustering LIGO data, marking a significant step forward in the field of gravitational wave research. The code of this paper is available at https://github.com/Zod-L/CTSAE

Cross-Temporal Spectrogram Autoencoder (CTSAE): Unsupervised Dimensionality Reduction for Clustering Gravitational Wave Glitches

TL;DR

CTSAE presents an unsupervised, four-branch CNN–ViT autoencoder that processes four time-window spectrograms of LIGO glitches and uses a shared CLS token with a CLS Fusion Module to fuse cross-branch information, yielding a discriminative latent code for clustering. Trained with per-branch reconstruction losses , the model demonstrates superior clustering performance (via NMI and ARI) on Gravity Spy O3 main-channel data relative to semi-supervised baselines, despite no ground-truth labels during training. The study shows that multi-branch architecture, CNN–ViT fusion, and CLS-based cross-branch communication are key to capturing both global and local glitch patterns across timescales, with reconstruction quality indicating faithful preservation of glitch structure. This unsupervised approach offers a scalable pathway for glitch identification in upcoming Gravity Spy 2.0 data across main and auxiliary channels, reducing dependence on manual labeling and enabling robust gravitational-wave detection pipelines.

Abstract

The advancement of The Laser Interferometer Gravitational-Wave Observatory (LIGO) has significantly enhanced the feasibility and reliability of gravitational wave detection. However, LIGO's high sensitivity makes it susceptible to transient noises known as glitches, which necessitate effective differentiation from real gravitational wave signals. Traditional approaches predominantly employ fully supervised or semi-supervised algorithms for the task of glitch classification and clustering. In the future task of identifying and classifying glitches across main and auxiliary channels, it is impractical to build a dataset with manually labeled ground-truth. In addition, the patterns of glitches can vary with time, generating new glitches without manual labels. In response to this challenge, we introduce the Cross-Temporal Spectrogram Autoencoder (CTSAE), a pioneering unsupervised method for the dimensionality reduction and clustering of gravitational wave glitches. CTSAE integrates a novel four-branch autoencoder with a hybrid of Convolutional Neural Networks (CNN) and Vision Transformers (ViT). To further extract features across multi-branches, we introduce a novel multi-branch fusion method using the CLS (Class) token. Our model, trained and evaluated on the GravitySpy O3 dataset on the main channel, demonstrates superior performance in clustering tasks when compared to state-of-the-art semi-supervised learning methods. To the best of our knowledge, CTSAE represents the first unsupervised approach tailored specifically for clustering LIGO data, marking a significant step forward in the field of gravitational wave research. The code of this paper is available at https://github.com/Zod-L/CTSAE
Paper Structure (13 sections, 17 equations, 5 figures, 3 tables)

This paper contains 13 sections, 17 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: An example of a blip glitch with four spectrograms corresponding to time windows of 0.5 s, 1.0 s, 2.0 s, and 4.0 s. The horizontal axis, the vertical axis and the color intensity in each time-frequency bin represent time, frequency and the energy level, respectively.
  • Figure 2: The architecture of CTSAE. The input comprises a glitch with four spectrograms of different time-window durations (0.5 s, 1.0 s, 2.0 s and 4.0 s). Four CNN-ViT encoders encode each spectrogram to extract high-level features, interconnected via s shared CLS token. These features, along with the shared CLS token, are fused by an MLP into a low-dimensional latent vector. This latent code is then shared among four decoders to generate spectrograms of different durations. Decoders communicate through a shared CLS token, similar to the encoder setup.
  • Figure 3: (a) The detailed architecture of each encoder/decoder branch. Information exchange between CNN layers and attention layers is achieved by downsampling and upsampling modules. (b) The architecture of the downsampling module. (c) The architecture of the upsampling module.
  • Figure 4: Our CLS fusion module. The shared CLS token queries all patch tokens to gather global information from all four branches. It is then concatenated with each branch to provide abstract information.
  • Figure 5: Reconstruction results on test data. Each column represents the spectrograms of the same glitch across four time windows: 0.5 s, 1.0 s, 2.0 s, and 4.0 s, from top to bottom. From left to right, the columns represent input glitches and their corresponding reconstructed glitches. Four samples are selected from the classes Chirp, Extremely Loud, Wandering Line, and 1080Line, respectively.