Table of Contents
Fetching ...

PETRA: From the LISA global fit to a catalog of Galactic binaries

Aaron D. Johnson, Javier Roulet, Katerina Chatziioannou, Michele Vallisneri, Chris G. Trejo, Kyle A. Gersbach

TL;DR

Petra provides a principled postprocessing method to turn a trans-dimensional, label-switching global fit into a catalog of Galactic binaries by relabeling samples to maximize a product-of-marginals representation $p'_{cat}$ while tracking each source's probability of astrophysical origin $P^*$. It formalizes the problem with an invertible labeling $\ell$ and optimizes the relabeling and auxiliary Gaussian distributions $q(\theta_{\alpha}|\phi_{\alpha})=\mathcal{N}(\theta_{\alpha}|\mu_{\alpha},\Sigma_{\alpha})$ via KL-divergence, connect­ing to the information loss $I_{loss}=-H(p'_{rel})+\sum_\alpha H(p'_{\alpha})$. Demonstrations on toy models and a mock LISA dataset show Petra can robustly resolve overlapping and multimodal sources, producing catalog posteriors $p_{\alpha}(\theta)$ with astrophysical-origin probabilities $P^*_{\alpha}$ that separate real signals from noise or confusion. Implemented in the open-source package petra_catalogs, Petra operates in postprocessing and is applicable to outputs from any global-fit sampler, offering a practical path forward for constructing interpretable catalogs from complex gravitational-wave data analyses.

Abstract

The Laser Interferometer Space Antenna (LISA) will detect mHz gravitational waves from many astrophysical sources, including millions of compact binaries in the Galaxy, thousands of which may be individually resolvable. The large number of signals overlapping in the LISA dataset requires a \emph{global fit} in which an unknown number of sources are modeled simultaneously. This introduces a \emph{label-switching ambiguity} for sources in the same class, making it challenging to distill a traditional astronomical catalog from global-fit posteriors. We present a method to construct a catalog by post-processing the global-fit posterior, relabeling samples to minimize the statistical divergence between the global fit and a factorized catalog representation. The resulting catalog consists of the source posterior distributions and their probabilities of having an astrophysical origin. We demonstrate our algorithm on two toy models and on a small simulated LISA dataset of Galactic binaries. Our method is implemented in the open-source Python package \textsc{petra\_catalogs}, and it can be applied in postprocessing to the output of any global-fit sampler.

PETRA: From the LISA global fit to a catalog of Galactic binaries

TL;DR

Petra provides a principled postprocessing method to turn a trans-dimensional, label-switching global fit into a catalog of Galactic binaries by relabeling samples to maximize a product-of-marginals representation while tracking each source's probability of astrophysical origin . It formalizes the problem with an invertible labeling and optimizes the relabeling and auxiliary Gaussian distributions via KL-divergence, connect­ing to the information loss . Demonstrations on toy models and a mock LISA dataset show Petra can robustly resolve overlapping and multimodal sources, producing catalog posteriors with astrophysical-origin probabilities that separate real signals from noise or confusion. Implemented in the open-source package petra_catalogs, Petra operates in postprocessing and is applicable to outputs from any global-fit sampler, offering a practical path forward for constructing interpretable catalogs from complex gravitational-wave data analyses.

Abstract

The Laser Interferometer Space Antenna (LISA) will detect mHz gravitational waves from many astrophysical sources, including millions of compact binaries in the Galaxy, thousands of which may be individually resolvable. The large number of signals overlapping in the LISA dataset requires a \emph{global fit} in which an unknown number of sources are modeled simultaneously. This introduces a \emph{label-switching ambiguity} for sources in the same class, making it challenging to distill a traditional astronomical catalog from global-fit posteriors. We present a method to construct a catalog by post-processing the global-fit posterior, relabeling samples to minimize the statistical divergence between the global fit and a factorized catalog representation. The resulting catalog consists of the source posterior distributions and their probabilities of having an astrophysical origin. We demonstrate our algorithm on two toy models and on a small simulated LISA dataset of Galactic binaries. Our method is implemented in the open-source Python package \textsc{petra\_catalogs}, and it can be applied in postprocessing to the output of any global-fit sampler.

Paper Structure

This paper contains 10 sections, 20 equations, 11 figures.

Figures (11)

  • Figure 1: Illustration of the global-fit to catalog algorithm, for a toy model of two sinusoidal signals parameterized by amplitude, frequency, and phase (see Sec. \ref{['sec:toy-sinusoid']}). For clarity we show only marginal posteriors of frequency, although the algorithm makes use of all parameters. Panel 1: We begin by identifying the relabeled posterior with the global fit. For fully converged chains all the marginal posteriors are statistically identical. Panel 2: We fit the marginals of the relabeled posterior with a multivariate normal over all source parameters. Panel 3: For every sample, we reassign each entry to either source (i.e., either normal) by maximizing the assignment probability, Eq. \ref{['eq:p_catalog_approx']}. Panel 4: We repeat steps 2 and 3 until the normals and relabelings stop changing. The marginals of the final relabeled samples define the two catalog posteriors.
  • Figure 2: Petra relabeling of two overlapping bidimensional Gaussians (first test of Sec. \ref{['sec:toy-gaussian']}). Top left: KL divergence ($\max_A D_\mathrm{KL}(p_{\mathrm{sim},A}||p_{\mathrm{cat},A})$, color scale) between simulated and reconstructed distributions, plotted against 1D EMDs of the variable pairs $x_{1,2}$ and $y_{1,2}$. Smaller EMDs correspond to greater overlaps: distributions begin to overlap at EMD $\sim 5$, and at EMD $\sim 2.5$ the mean of one falls within the support of the other. Bottom left: KL divergence versus fraction of mislabeled samples. A large fraction of mislabelings in a region of strong overlap will not affect the divergence significantly, since the samples are effectively indistinguishable. Top, middle, and bottom left: Examples of simulated and reconstructed distributions corresponding to the star, square, and triangle in the top left plot. Reconstructed and injected distributions appear very close even for low EMDs.
  • Figure 3: Petra relabeling of two superimposed sinusoids (second test of Sec. \ref{['sec:toy-gaussian']}). Each row corresponds to a different $\delta f$ between the sinusoids, and shows distributions of frequency (left), amplitude (middle), and phase (right). The gray shaded region covers a frequency bin. The filled magenta histogram displays all global-fit entries together, approximating the identical marginal distributions of the global fit. The magenta and green contours displays the two source catalog posteriors built by Petra. Dashed vertical lines indicate the true (simulated) values of each parameter. The two sources begin to be resolved at a frequency separation of 0.2 frequency bins.
  • Figure 4: Same as Fig. \ref{['fig:frequency_sequence']}, except that source 2 has a bimodal frequency posterior, emulated by simulating three signals (two with identical amplitude and phase but different frequencies) and recovering two. Petra identifies each source correctly and recovers parameters consistent with their true values.
  • Figure 5: Probability of astrophysical origin for each of the 10 catalog sources, ordered by decreasing probability. The simulated data contain 10 sources, 5 of which are detectable, while the transdimensional global fit contains posterior samples with as little as 5 and as many as 9 sources. The Petra catalog reports $P^*_\alpha = 1$ for the 5 indisputable sources, lower values for the "rogue" global-fit entries, and $P^*_\alpha = 0$ for a putative 10th source.
  • ...and 6 more figures