Table of Contents
Fetching ...

Modeling Non-Gaussianities in Pulsar Timing Array data analysis using Gaussian Mixture Models

Mikel Falxa, Alberto Sesana

TL;DR

This paper tackles the problem that PTA noise and signals may deviate from Gaussianity, particularly for a SMBHB-induced GWB. It introduces Gaussian Mixture Models on Fourier coefficients to capture non-Gaussian statistics while preserving analytical marginalization, enabling posterior inference of higher-order moments and integration into spectral refitting. Through simulations of single-pulsar noise, common red noise, and a Gaussian background plus loud CGWs, the approach demonstrates Bayes-factor–driven detection of non-Gaussian features and reveals how deterministic sources can suppress apparent non-Gaussianity if properly modeled. In more realistic SMBHB population scenarios, non-Gaussianities become subtler and may require larger PTAs to detect, but the framework provides a principled path to exploiting higher-order statistics to characterize GW source populations and outliers within a stochastic background.

Abstract

In Pulsar Timing Array (PTA) data analysis, noise is typically assumed to be Gaussian, and the marginalized likelihood has a well-established analytical form derived within the framework of Gaussian processes. However, this Gaussianity assumption may break down for certain classes of astrophysical and cosmological signals, particularly for a gravitational wave background (GWB) generated by a population of supermassive black hole binaries (SMBHBs). In this work, we present a new method for testing the presence of non-Gaussian features in PTA data. We go beyond the Gaussian assumption by modeling the noise or signal statistics using a Gaussian mixture model (GMM). An advantage of this approach is that the marginalization of the likelihood remains fully analytical, expressed as a linear combination of Gaussian PTA likelihoods. This makes the method straightforward to implement within existing data analysis tools. Moreover, this method extends beyond the free spectrum analysis by producing posterior probability distributions of higher-order moments inferred from the data, which can be incorporated into spectral refitting techniques. We validate the model using simulations and demonstrate the sensitivity of PTAs to non-Gaussianity by computing the Bayes factor in favor of the GMM as a function of the injected excess moments. We apply the method to a more astrophysically motivated scenario where a single SMBHB is resolved on top of a Gaussian GWB and show that significant non-Gaussianities are introduced by the individual source. Finally, we test our model on a realistic GWB generated from a simulated population of SMBHBs.

Modeling Non-Gaussianities in Pulsar Timing Array data analysis using Gaussian Mixture Models

TL;DR

This paper tackles the problem that PTA noise and signals may deviate from Gaussianity, particularly for a SMBHB-induced GWB. It introduces Gaussian Mixture Models on Fourier coefficients to capture non-Gaussian statistics while preserving analytical marginalization, enabling posterior inference of higher-order moments and integration into spectral refitting. Through simulations of single-pulsar noise, common red noise, and a Gaussian background plus loud CGWs, the approach demonstrates Bayes-factor–driven detection of non-Gaussian features and reveals how deterministic sources can suppress apparent non-Gaussianity if properly modeled. In more realistic SMBHB population scenarios, non-Gaussianities become subtler and may require larger PTAs to detect, but the framework provides a principled path to exploiting higher-order statistics to characterize GW source populations and outliers within a stochastic background.

Abstract

In Pulsar Timing Array (PTA) data analysis, noise is typically assumed to be Gaussian, and the marginalized likelihood has a well-established analytical form derived within the framework of Gaussian processes. However, this Gaussianity assumption may break down for certain classes of astrophysical and cosmological signals, particularly for a gravitational wave background (GWB) generated by a population of supermassive black hole binaries (SMBHBs). In this work, we present a new method for testing the presence of non-Gaussian features in PTA data. We go beyond the Gaussian assumption by modeling the noise or signal statistics using a Gaussian mixture model (GMM). An advantage of this approach is that the marginalization of the likelihood remains fully analytical, expressed as a linear combination of Gaussian PTA likelihoods. This makes the method straightforward to implement within existing data analysis tools. Moreover, this method extends beyond the free spectrum analysis by producing posterior probability distributions of higher-order moments inferred from the data, which can be incorporated into spectral refitting techniques. We validate the model using simulations and demonstrate the sensitivity of PTAs to non-Gaussianity by computing the Bayes factor in favor of the GMM as a function of the injected excess moments. We apply the method to a more astrophysically motivated scenario where a single SMBHB is resolved on top of a Gaussian GWB and show that significant non-Gaussianities are introduced by the individual source. Finally, we test our model on a realistic GWB generated from a simulated population of SMBHBs.

Paper Structure

This paper contains 18 sections, 40 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: Example of distributions obtained with a GMM of 2 Gaussians with $\Phi=1$ (see \ref{['eq:gaussian_mixture']}) and different sets of parameters $\alpha$, $\mu$ and $c$ introducing heavier tails or asymmetries (i.e. kurtosis and skewness).
  • Figure 2: Corner plot of the recovered parameters for the non-Gaussian model in single pulsar noise analysis. The Fourier coefficients are also sampled but not shown here for simplicity. The dashed line show the injected parameter values. The shaded areas show the 1 and 2 $\sigma$ credible regions.
  • Figure 3: Bayes factor $\mathcal{B}^{NG}_G$ between the non-Gaussian and Gaussian models as a function of the injected relative excess kurtosis $\Delta \bar{M}_4$ calculated using \ref{['eq:excess_moments']} for many simulations. The solid line is the median and the shaded region is bounded by the 16th and 84th percentiles. We show the two cases : (i) SPA (Single Pulsar Analysis) with 100 frequencies powerlaw red noise, (ii) CRN (Common red noise) with 100 pulsars.
  • Figure 4: Recovered free spectrum and excess kurtosis for the Gaussian model and non-Gaussian model at the 2, 3, 4, 5 harmonics of 10yrs for the simulated data containing a non-Gaussian common red noise following a powerlaw spectrum with $\log_{10} A=-15$, $\gamma=13/3$, $\alpha=0.5$ and $c=10$. (Top panel) Violin plot of the estimated free spectrum, the two models are very consistent. (Bottom panel) Violin plot of the excess kurtosis posterior distribution giving zero for the Gaussian model, the dashed line shows the injected excess kurtosis obtained with \ref{['eq:excess_moments']}, that is the same for all frequencies.
  • Figure 5: Comparison between the injected and recovered probability distribution of the Fourier coefficients $a$, divided by the standard deviation $\sigma_a$, showing the deviation from Gaussianity. The blue shaded region corresponding to the binned 1-$\sigma$ error obtained with 1000 reconstructions of $p(a_n, \Phi_n, \alpha,c|\delta t)$ using samples from $p(\Phi_n, \alpha,c|\delta t)$ according to \ref{['eq:posterior_ng']}. The dotted line shows a standard normal distribution for comparison.
  • ...and 7 more figures