Table of Contents
Fetching ...

CLAWDIA: A dictionary learning framework for gravitational-wave data analysis

Miquel Llorens-Monteagudo, Alejandro Torres-Forné, José A. Font

TL;DR

CLAWDIA introduces a modular, open-source SDL-based framework for gravitational-wave data analysis that jointly handles denoising and classification under realistic detector noise. It integrates sparse representation theory (D, α) with LRSDL classification and provides companion tools (GWADAMA) for dataset construction and conditioning. The paper demonstrates the approach on denoising the GW170817 signal and classifying LIGO glitches, highlighting robustness at low SNR and interpretable morphologies. By prioritizing data-scarce scenarios and interpretability, CLAWDIA offers a complementary path to deep learning within GW data analysis pipelines and outlines a clear roadmap for future extensions, including detection and parameter estimation.

Abstract

Deep-learning methods are becoming increasingly important in gravitational-wave data analysis, yet their performance often relies on large training datasets and models whose internal representations are difficult to interpret. Sparse dictionary learning (SDL) offers a complementary approach: it performs well in scarce-data regimes and yields physically interpretable representations of gravitational-wave morphology. Here we present CLAWDIA (Comprehensive Library for the Analysis of Waves via Dictionary-based Algorithms), an open-source Python framework that integrates SDL-based denoising and classification under realistic detector noise. We systematise previously isolated SDL workflows into a unified, modular environment with a consistent, user-friendly interface. The current release provides several time-domain denoising strategies based on LASSO-regularised sparse coding and a classifier based on Low-Rank Shared Dictionary Learning. A companion toolbox, GWADAMA, supports dataset construction and realistic conditioning of real and simulated interferometer data. We demonstrate CLAWDIA's performance by denoising the signal from binary neutron star event GW170817 and by classifying families of instrumental glitches from LIGO's third observing run, highlighting robustness at low signal-to-noise ratios. CLAWDIA is intended as a community-driven, interoperable library extensible to additional tasks, including detection and parameter estimation.

CLAWDIA: A dictionary learning framework for gravitational-wave data analysis

TL;DR

CLAWDIA introduces a modular, open-source SDL-based framework for gravitational-wave data analysis that jointly handles denoising and classification under realistic detector noise. It integrates sparse representation theory (D, α) with LRSDL classification and provides companion tools (GWADAMA) for dataset construction and conditioning. The paper demonstrates the approach on denoising the GW170817 signal and classifying LIGO glitches, highlighting robustness at low SNR and interpretable morphologies. By prioritizing data-scarce scenarios and interpretability, CLAWDIA offers a complementary path to deep learning within GW data analysis pipelines and outlines a clear roadmap for future extensions, including detection and parameter estimation.

Abstract

Deep-learning methods are becoming increasingly important in gravitational-wave data analysis, yet their performance often relies on large training datasets and models whose internal representations are difficult to interpret. Sparse dictionary learning (SDL) offers a complementary approach: it performs well in scarce-data regimes and yields physically interpretable representations of gravitational-wave morphology. Here we present CLAWDIA (Comprehensive Library for the Analysis of Waves via Dictionary-based Algorithms), an open-source Python framework that integrates SDL-based denoising and classification under realistic detector noise. We systematise previously isolated SDL workflows into a unified, modular environment with a consistent, user-friendly interface. The current release provides several time-domain denoising strategies based on LASSO-regularised sparse coding and a classifier based on Low-Rank Shared Dictionary Learning. A companion toolbox, GWADAMA, supports dataset construction and realistic conditioning of real and simulated interferometer data. We demonstrate CLAWDIA's performance by denoising the signal from binary neutron star event GW170817 and by classifying families of instrumental glitches from LIGO's third observing run, highlighting robustness at low signal-to-noise ratios. CLAWDIA is intended as a community-driven, interoperable library extensible to additional tasks, including detection and parameter estimation.

Paper Structure

This paper contains 26 sections, 20 equations, 6 figures, 3 algorithms.

Figures (6)

  • Figure 1: Schematic representation of the internal structure of the LRSDL classification dictionary $\bar{\bm{D}}$ and coefficient matrix $\bar{\bm{A}}$. In this example, five samples with two features each are reconstructed using the LRSDL model. The class-specific dictionary $\bm{D}$ and coefficient matrix $\bm{A}$ are formed by stacking four class-wise submatrices, each containing three atoms. The shared dictionary $\bm{D}_0$ and coefficient matrix $\bm{A}^0$ also contain three atoms, although their size do not need to match that of the class-specific components $\bm{D}_c$.
  • Figure 2: Workflow of the classification pipeline implemented in clawdia. Input signals are segmented into overlapping patches and denoised using the reconstruction dictionary $\bm{D}_\text{den}$ before being reassembled. The denoised signals are then classified using the dictionary $\bm{D}_\text{clas}$ trained for label discrimination.
  • Figure 3: Workflow of the denoising of GW170817 by three different methods (green area). The blue area corresponds to the preprocessing steps applied to the input strain, including the estimation of the ASD from the detector's background noise, used to whiten the training signals by which the denoising dictionaries (pink areas) were trained.
  • Figure 4: Time-frequency representations of the Hanford strain (\ref{['sub@fig:gw170817_denoisings:input']}) and, for each method, the reconstruction (left column) with its residual (right column). Residuals share the reference colour scale; reconstructions share a wider one for readability. A normalised time-series overlay (black) is shown above each panel.
  • Figure 5: Time-frequency representations of the four families of glitches chosen for the classification demonstration: Blip (\ref{['sub@fig:glitch_classes:blip']}), Koi Fish (\ref{['sub@fig:glitch_classes:koi_fish']}), Tomte (\ref{['sub@fig:glitch_classes:tomte']}), and Whistle (\ref{['sub@fig:glitch_classes:whistle']}). A normalised time-series overlay (black) is shown above each panel.
  • ...and 1 more figures