Table of Contents
Fetching ...

Enhancing gravitational-wave detection: a machine learning pipeline combination approach with robust uncertainty quantification

Gregory Ashton, Ann-Kristin Malz, Nicolo Colombo

TL;DR

The paper tackles the challenge of interpreting multi-pipeline gravitational-wave detections by learning a data-driven combination of pipeline outputs and augmenting it with robust uncertainty quantification. It trains two simple classifiers (LR and MLP) on per-pipeline candidate features to improve detection efficiency beyond the traditional max $p_{astro}$ or max IFAR approaches, and then applies conformal prediction to yield calibrated, event-level confidence. The results show that ML-based fusion can boost ROC AUC, and CP provides a principled way to assign uncertainty to individual events, with notable implications for sub-threshold candidates like GW200311_103121. The work highlights practical benefits for GW catalogs and low-latency alerts, while outlining realistic limitations and avenues for future enhancements, such as more realistic training data, richer features, and multi-class extensions.

Abstract

Gravitational-wave data from advanced-era interferometric detectors consists of background Gaussian noise, frequent transient artefacts, and rare astrophysical signals. Multiple search algorithms exist to detect the signals from compact binary coalescences, but their varying performance complicates interpretation. We present a machine learning-driven approach that combines results from individual pipelines and utilises conformal prediction to provide robust, calibrated uncertainty quantification. Using simulations, we demonstrate improved detection efficiency and apply our model to GWTC-3, enhancing confidence in multi-pipeline detections, such as the sub-threshold binary neutron star candidate GW200311_103121.

Enhancing gravitational-wave detection: a machine learning pipeline combination approach with robust uncertainty quantification

TL;DR

The paper tackles the challenge of interpreting multi-pipeline gravitational-wave detections by learning a data-driven combination of pipeline outputs and augmenting it with robust uncertainty quantification. It trains two simple classifiers (LR and MLP) on per-pipeline candidate features to improve detection efficiency beyond the traditional max or max IFAR approaches, and then applies conformal prediction to yield calibrated, event-level confidence. The results show that ML-based fusion can boost ROC AUC, and CP provides a principled way to assign uncertainty to individual events, with notable implications for sub-threshold candidates like GW200311_103121. The work highlights practical benefits for GW catalogs and low-latency alerts, while outlining realistic limitations and avenues for future enhancements, such as more realistic training data, richer features, and multi-class extensions.

Abstract

Gravitational-wave data from advanced-era interferometric detectors consists of background Gaussian noise, frequent transient artefacts, and rare astrophysical signals. Multiple search algorithms exist to detect the signals from compact binary coalescences, but their varying performance complicates interpretation. We present a machine learning-driven approach that combines results from individual pipelines and utilises conformal prediction to provide robust, calibrated uncertainty quantification. Using simulations, we demonstrate improved detection efficiency and apply our model to GWTC-3, enhancing confidence in multi-pipeline detections, such as the sub-threshold binary neutron star candidate GW200311_103121.

Paper Structure

This paper contains 5 sections, 3 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The ROC for the LR and MLP ML-driven pipeline combination approaches applied to the test data; we also include as a comparison the standard maximum-IFAR pipeline combination approach. For this test, we use all four pipelines contributing to the MDC and all features in our test data (see the Appendix for details). To investigate the uncertainty inherent in the ROC curve, we run the study under different permutations of the training and test data. The solid lines indicate the ROC calculated for a single permutation of the test data, while the shaded band marks the 90% interval over the permutations.
  • Figure 2: The conditional confidence in the signal label as measured by the LR model and applied to the test data compared to the maximum IFAR. We highlight the true label ($\hat{y}$) by the colour, the number of contributing pipelines ($N_p$) by the size, and the chirp mass ($\mathcal{M}$) range inferred by the highest-SNR pipeline by the symbol. A vertical dashed line marks a FAR threshold of 1 per year.
  • Figure 3: A comparison of the $p_{\rm astro}$ and conditional confidence using the LR model trained on a subset of the MDC data.
  • Figure 4: The 90% upper limit on the measured AUC for the LR and MLP calculated on different permutations of the data split, the pipelines included in the feature set, and the per-pipeline feature set. We show curves averaged over the number of pipelines and as a function of the number of parameters (with an ordering that reflects a choice of the likely importance, starting with the IFAR, mass of the binary, SNR, etc.). We note, however, that CWB does not produce estimates of all features; in this case, empty rows are provided, adding no additional information.
  • Figure 5: The cumulative histogram of the true positive rate against the preferred-pipeline SNR using the LR confidence as a threshold (green solid curve) and the maximum-IFAR (orange dashed line). For the LR pipeline combination approach, we set a threshold of conditional confidence in the signal label greater than 0.5. For the maximum-IFAR pipeline combination approach, we set a threshold of 1 year. These thresholds are arbitrarily chosen and happen to approximately match at the maximum SNR, which helps elucidate where they differ at lower SNR (shown in the inset axis).