Robustness of Neural Ratio and Posterior Estimators to Distributional Shifts for Population-Level Dark Matter Analysis in Strong Gravitational Lensing
Andreas Filipp, Yashar Hezaveh, Laurence Perreault-Levasseur
TL;DR
This work investigates how distributional shifts between training simulations and real data affect neural ratio estimators (NREs) and sequential neural posterior estimators (SNPEs) in inferring the population-level dark matter subhalo mass function from strong gravitational lensing. By building two distinct likelihood-free pipelines—NRE and SNPE—and subjecting them to controlled nuisance-parameter shifts, the study demonstrates that both methods lose reliability under even modest out-of-distribution conditions, with biases that intensify as more lenses are combined. The SNPE approach, equipped with hierarchical inference, can mitigate some misspecifications within the test distribution, but substantial biases remain for larger shifts, particularly in background source morphologies. The results emphasize the need for careful validation, domain adaptation, and robust calibration when applying these methods to real astrophysical data, where true distributions are never perfectly known.
Abstract
We investigate the robustness of Neural Ratio Estimators (NREs) and Neural Posterior Estimators (NPEs) to distributional shifts in the context of measuring the abundance of dark matter subhalos using strong gravitational lensing data. While these data-driven inference frameworks can be accurate on test data from the same distribution as the training sets, in real applications, it is expected that simulated training data and true observational data will differ in their distributions. We explore the behavior of a trained NRE and trained sequential NPEs to estimate the population-level parameters of dark matter subhalos from a large sample of images of strongly lensed galaxies with test data presenting distributional shifts within and beyond the bounds of the training distribution in the nuisance parameters (e.g., the background source morphology). While our results show that NREs and NPEs perform well when tested perfectly in distribution, they exhibit significant biases when confronted with slight deviations from the examples seen in the training distribution. This indicates the necessity for caution when applying NREs and NPEs to real astrophysical data, where high-dimensional underlying distributions are not perfectly known.
