Manifold Learning for Source Separation in Confusion-Limited Gravitational-Wave Data
Jericho Cain
TL;DR
The paper tackles source separation in LISA's confusion-limited gravitational-wave data by combining a CNN autoencoder trained on confusion background with a manifold-learning term that quantifies off-manifold deviations in latent space. This latent-space geometry is integrated into a joint anomaly score, improving detection of resolvable sources beyond reconstruction error alone. On synthetic LISA-like data, the optimal configuration achieves an AUC of $0.752$, with precision $0.81$ and recall $0.61$, a $35 ext{ extpercent}$ improvement over autoencoder-only detection. The work demonstrates that latent-space geometry captures discriminative information about the confusion background and suggests manifold-learning can meaningfully augment LISA data-analysis pipelines for confusion-limited source separation.
Abstract
The Laser Interferometer Space Antenna (LISA) will observe gravitational waves in a regime that differs sharply from what ground-based detectors such as LIGO handle. Instead of searching for rare signals buried in loud instrumental noise, LISA's main challenge is that its data stream contains millions of unresolved galactic binaries. These blend into a confusion background, and the task becomes identifying sources that stand out from that signal population. We explore whether manifold-learning tools can help with this separation problem. We built a CNN autoencoder trained on the confusion background and used its reconstruction error, while also taking advantage of geometric structure in the latent space by adding a manifold-based normalization term to the anomaly score. The model was trained on synthetic LISA data with instrumental noise and confusion background, and tested on datasets with injected resolvable sources such as massive black hole binaries, extreme mass ratio inspirals, and individual galactic binaries. A grid search over $α$ and $β$ in the combined score $α\cdot \mathrm{AE}_{\mathrm{error}} + β\cdot \mathrm{manifold}_{\mathrm{norm}}$ found optimal performance near $α= 0.5$ and $β= 2.0$, indicating that latent-space geometry provides more discriminatory information than reconstruction error alone. With this combination, the method achieves an AUC of $0.752$, precision $0.81$, and recall $0.61$, a $35\%$ improvement over the autoencoder alone. These results suggest that manifold-learning techniques could complement LISA data-analysis pipelines in identifying resolvable sources within confusion-limited data.
