Table of Contents
Fetching ...

Manifold Learning for Source Separation in Confusion-Limited Gravitational-Wave Data

Jericho Cain

TL;DR

The paper tackles source separation in LISA's confusion-limited gravitational-wave data by combining a CNN autoencoder trained on confusion background with a manifold-learning term that quantifies off-manifold deviations in latent space. This latent-space geometry is integrated into a joint anomaly score, improving detection of resolvable sources beyond reconstruction error alone. On synthetic LISA-like data, the optimal configuration achieves an AUC of $0.752$, with precision $0.81$ and recall $0.61$, a $35 ext{ extpercent}$ improvement over autoencoder-only detection. The work demonstrates that latent-space geometry captures discriminative information about the confusion background and suggests manifold-learning can meaningfully augment LISA data-analysis pipelines for confusion-limited source separation.

Abstract

The Laser Interferometer Space Antenna (LISA) will observe gravitational waves in a regime that differs sharply from what ground-based detectors such as LIGO handle. Instead of searching for rare signals buried in loud instrumental noise, LISA's main challenge is that its data stream contains millions of unresolved galactic binaries. These blend into a confusion background, and the task becomes identifying sources that stand out from that signal population. We explore whether manifold-learning tools can help with this separation problem. We built a CNN autoencoder trained on the confusion background and used its reconstruction error, while also taking advantage of geometric structure in the latent space by adding a manifold-based normalization term to the anomaly score. The model was trained on synthetic LISA data with instrumental noise and confusion background, and tested on datasets with injected resolvable sources such as massive black hole binaries, extreme mass ratio inspirals, and individual galactic binaries. A grid search over $α$ and $β$ in the combined score $α\cdot \mathrm{AE}_{\mathrm{error}} + β\cdot \mathrm{manifold}_{\mathrm{norm}}$ found optimal performance near $α= 0.5$ and $β= 2.0$, indicating that latent-space geometry provides more discriminatory information than reconstruction error alone. With this combination, the method achieves an AUC of $0.752$, precision $0.81$, and recall $0.61$, a $35\%$ improvement over the autoencoder alone. These results suggest that manifold-learning techniques could complement LISA data-analysis pipelines in identifying resolvable sources within confusion-limited data.

Manifold Learning for Source Separation in Confusion-Limited Gravitational-Wave Data

TL;DR

The paper tackles source separation in LISA's confusion-limited gravitational-wave data by combining a CNN autoencoder trained on confusion background with a manifold-learning term that quantifies off-manifold deviations in latent space. This latent-space geometry is integrated into a joint anomaly score, improving detection of resolvable sources beyond reconstruction error alone. On synthetic LISA-like data, the optimal configuration achieves an AUC of , with precision and recall , a improvement over autoencoder-only detection. The work demonstrates that latent-space geometry captures discriminative information about the confusion background and suggests manifold-learning can meaningfully augment LISA data-analysis pipelines for confusion-limited source separation.

Abstract

The Laser Interferometer Space Antenna (LISA) will observe gravitational waves in a regime that differs sharply from what ground-based detectors such as LIGO handle. Instead of searching for rare signals buried in loud instrumental noise, LISA's main challenge is that its data stream contains millions of unresolved galactic binaries. These blend into a confusion background, and the task becomes identifying sources that stand out from that signal population. We explore whether manifold-learning tools can help with this separation problem. We built a CNN autoencoder trained on the confusion background and used its reconstruction error, while also taking advantage of geometric structure in the latent space by adding a manifold-based normalization term to the anomaly score. The model was trained on synthetic LISA data with instrumental noise and confusion background, and tested on datasets with injected resolvable sources such as massive black hole binaries, extreme mass ratio inspirals, and individual galactic binaries. A grid search over and in the combined score found optimal performance near and , indicating that latent-space geometry provides more discriminatory information than reconstruction error alone. With this combination, the method achieves an AUC of , precision , and recall , a improvement over the autoencoder alone. These results suggest that manifold-learning techniques could complement LISA data-analysis pipelines in identifying resolvable sources within confusion-limited data.

Paper Structure

This paper contains 10 sections, 12 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Architecture of the CNN-based autoencoder with manifold learning for LISA gravitational wave source separation. The encoder (top path) processes CWT scalograms (64×3600) through two convolutional layers (Conv2d) with adaptive pooling, followed by linear layers to produce a 32-dimensional latent representation $z$. The decoder (bottom path) reconstructs the input through linear and transposed convolutional layers. Reconstruction error $\epsilon(x)$ is computed from the difference between input and reconstructed scalograms. In parallel, the manifold learning branch (left side) operates on the latent space: k-NN search (k=32) identifies nearest neighbors from training data, local PCA estimates the tangent space, and the off-manifold distance $\delta_{\perp}(z)$ quantifies deviation from the learned manifold. The combined anomaly score $s(x) = \alpha \cdot \epsilon(x) + \beta \cdot \delta_{\perp}(z)$ with $\alpha = 0.5$ and $\beta = 2.0$ (optimal from grid search) leverages both reconstruction error and geometric structure to detect resolvable sources in confusion background. During training, the k-NN index is built on latents from confusion background data only.
  • Figure 2: An embedded manifold $\mathcal{M} \subset \mathcal{X}$ representing the structure of high-dimensional data, where $\mathcal{X} \subset \mathbb{R}^D$ is the data space. A chart $\phi|_{\mathcal{M}}: \mathcal{M} \to \mathcal{Z}$ maps a point $p \in \mathcal{M}$ to low-dimensional coordinates $(x^1,\dots,x^d)$ in the latent space $\mathcal{Z} \subset \mathbb{R}^d$, analogous to the encoder of an autoencoder. The decoder $\psi: \mathcal{Z} \to \mathcal{M}$ approximates the inverse chart $\phi^{-1}$, reconstructing points on $\mathcal{M}$ from their latent coordinates. The tangent space $T_p\mathcal{M}$ provides the best linear approximation to $\mathcal{M}$ near $p$, highlighting the connection between differential geometry and latent-variable models in machine learning.
  • Figure 3: Receiver Operating Characteristic (ROC) curves comparing autoencoder-only detection (alpha=0.5, beta=0, dashed blue line, AUC=0.559) with manifold-enhanced detection (alpha=0.5, beta=2.0, solid red line, AUC=0.752). The diagonal dashed line represents a random classifier (AUC=0.5). The manifold-enhanced approach achieves significantly higher true positive rates across all false positive rates, demonstrating the benefit of incorporating geometric information from the latent space manifold. The improvement in AUC from 0.559 to 0.752 represents a 35% relative improvement over the autoencoder-only baseline.
  • Figure 4: Precision-Recall curves comparing autoencoder-only detection (alpha=0.5, beta=0, dashed blue line) with manifold-enhanced detection (alpha=0.5, beta=2.0, solid red line). The horizontal dashed line represents the baseline precision of a random classifier. The manifold-enhanced approach maintains higher precision across all recall values, indicating better discrimination of resolvable sources from confusion background. At the optimal operating point (median threshold), the manifold-enhanced method achieves precision=0.81 and recall=0.61, compared to precision=0.65 and recall=0.45 for the autoencoder-only baseline.
  • Figure 5: t-SNE projections of the 32-dimensional latent space learned by the autoencoder, showing the manifold structure of confusion background and resolvable sources. Training background samples (blue circles) form a dense, connected manifold representing typical LISA confusion noise. Test background samples (cyan squares) are embedded within this manifold, confirming that the model generalizes to unseen confusion background. Resolvable signals (red triangles) are distributed throughout the latent space, with many points lying off the learned manifold. (a) The 2D projection shows the overall structure but signals may appear embedded due to projection artifacts. (b) The 3D projection reveals that signals form a diffuse halo around the central background cluster, with the vast majority located at the surface or periphery rather than buried deep within it. This geometric structure---signals at the manifold boundary rather than completely isolated---demonstrates that signals are off-manifold but remain in proximity to the background distribution, consistent with the source separation challenge in LISA's confusion-limited regime. The visualization is based on 5000 training samples and 600 test samples (200 background, 400 signals).
  • ...and 1 more figures