Field-level simulation-based inference with galaxy catalogs: the impact of systematic effects

Natalí S. M. de Santi; Francisco Villaescusa-Navarro; L. Raul Abramo; Helen Shao; Lucia A. Perez; Tiago Castro; Yueying Ni; Christopher C. Lovell; Elena Hernandez-Martinez; Federico Marinacci; David N. Spergel; Klaus Dolag; Lars Hernquist; Mark Vogelsberger

Field-level simulation-based inference with galaxy catalogs: the impact of systematic effects

Natalí S. M. de Santi, Francisco Villaescusa-Navarro, L. Raul Abramo, Helen Shao, Lucia A. Perez, Tiago Castro, Yueying Ni, Christopher C. Lovell, Elena Hernandez-Martinez, Federico Marinacci, David N. Spergel, Klaus Dolag, Lars Hernquist, Mark Vogelsberger

TL;DR

This work extends field-level likelihood-free inference for cosmology from galaxy catalogs by incorporating realistic observational systematics into a graph-neural-network framework. Using thousands of CAMELS hydrodynamic simulations, it builds galaxy graphs and predicts the posterior mean and uncertainty of $Ω_{\rm m}$ via moment neural networks, testing robustness to masking, velocity and distance errors, and galaxy selection criteria. The results show that the approach remains robust across most systematics, with over 90% of catalogs maintaining high performance after outlier removal, though certain effects (notably large velocity perturbations and some selections) degrade accuracy in some simulations such as Magneticum. This demonstrates the potential of applying field-level GNN inference to real galaxy data, while highlighting the need for larger-volume simulations and broader parameter coverage to fully realize its cosmological constraining power.

Abstract

It has been recently shown that a powerful way to constrain cosmological parameters from galaxy redshift surveys is to train graph neural networks to perform field-level likelihood-free inference without imposing cuts on scale. In particular, de Santi et al. (2023) developed models that could accurately infer the value of $Ω_{\rm m}$ from catalogs that only contain the positions and radial velocities of galaxies that are robust to uncertainties in astrophysics and subgrid models. However, observations are affected by many effects, including 1) masking, 2) uncertainties in peculiar velocities and radial distances, and 3) different galaxy selections. Moreover, observations only allow us to measure redshift, intertwining galaxies' radial positions and velocities. In this paper we train and test our models on galaxy catalogs, created from thousands of state-of-the-art hydrodynamic simulations run with different codes from the CAMELS project, that incorporate these observational effects. We find that, although the presence of these effects degrades the precision and accuracy of the models, and increases the fraction of catalogs where the model breaks down, the fraction of galaxy catalogs where the model performs well is over 90 %, demonstrating the potential of these models to constrain cosmological parameters even when applied to real data.

Field-level simulation-based inference with galaxy catalogs: the impact of systematic effects

TL;DR

via moment neural networks, testing robustness to masking, velocity and distance errors, and galaxy selection criteria. The results show that the approach remains robust across most systematics, with over 90% of catalogs maintaining high performance after outlier removal, though certain effects (notably large velocity perturbations and some selections) degrade accuracy in some simulations such as Magneticum. This demonstrates the potential of applying field-level GNN inference to real galaxy data, while highlighting the need for larger-volume simulations and broader parameter coverage to fully realize its cosmological constraining power.

Abstract

from catalogs that only contain the positions and radial velocities of galaxies that are robust to uncertainties in astrophysics and subgrid models. However, observations are affected by many effects, including 1) masking, 2) uncertainties in peculiar velocities and radial distances, and 3) different galaxy selections. Moreover, observations only allow us to measure redshift, intertwining galaxies' radial positions and velocities. In this paper we train and test our models on galaxy catalogs, created from thousands of state-of-the-art hydrodynamic simulations run with different codes from the CAMELS project, that incorporate these observational effects. We find that, although the presence of these effects degrades the precision and accuracy of the models, and increases the fraction of catalogs where the model breaks down, the fraction of galaxy catalogs where the model performs well is over 90 %, demonstrating the potential of these models to constrain cosmological parameters even when applied to real data.

Paper Structure (22 sections, 8 equations, 12 figures, 5 tables)

This paper contains 22 sections, 8 equations, 12 figures, 5 tables.

Introduction
Data
Simulations
Galaxy catalogs
Observational effects
Methodology
Galaxy graphs
GNNs architecture
Likelihood-free inference and the loss function
Training procedure and optimization
Performance Metrics
Results
Masking
Peculiar velocity uncertainties
Masking and peculiar velocity uncertainties
...and 7 more sections

Figures (12)

Figure 1: Truth - Inference of $\Omega_{\rm m}$ -- Masking: removing $10 \%$ and $5 \%$ of the galaxies, respectively on the left and on the right panels. We present the predictions for galaxy catalogs from Astrid, SIMBA, IllustrisTNG, SB28, Magneticum, and SWIFT-EAGLE. For each simulation suite, we indicate the average $\chi^2$ value across all galaxy catalogs in the test set. We also list the $\chi^2$ values after removing outliers, which are selected as catalogs whose predictions exhibit $\chi^2 > 10$ and present the percentage of outliers (percentage of catalogs removed after this selection).
Figure 2: Truth - Inference of $\Omega_{\rm m}$ -- Peculiar velocity uncertainties: absolute error. For $V = 150$ km/s and $100$ km/s for each galaxy velocity, respectively on the left and the right panels. We present the predictions for galaxy catalogs from Astrid, SIMBA, IllustrisTNG, SB28, Magneticum, and SWIFT-EAGLE. For each simulation suite, we indicate the average $\chi^2$ value across all galaxy catalogs in the test set. We also list the $\chi^2$ values after removing outliers, which are selected as catalogs whose predictions exhibit $\chi^2 > 10$ and present the percentage of outliers (percentage of catalogs removed after this selection).
Figure 3: Truth - Inference of $\Omega_{\rm m}$ -- Peculiar velocity uncertainties: relative error. For $P = 25 \%$ and $15 \%$ of the galaxy velocities, respectively on the left and on the right panels. We present the predictions for galaxy catalogs from Astrid, SIMBA, IllustrisTNG, SB28, Magneticum, and SWIFT-EAGLE. For each simulation suite, we indicate the average $\chi^2$ value across all galaxy catalogs in the test set. We also list the $\chi^2$ values after removing outliers, which are selected as catalogs whose predictions exhibit $\chi^2 > 10$ and present the percentage of outliers (percentage of catalogs removed after this selection).
Figure 4: Truth - Inference of $\Omega_{\rm m}$ -- Masking and perturbing the galaxy velocities. In this figure we are masking the galaxies in $5 \%$ and considering galaxy velocity uncertainties absolutely and relatively, respectively on the left and on the right panels. We present the predictions for galaxy catalogs from Astrid, SIMBA, IllustrisTNG, SB28, Magneticum, and SWIFT-EAGLE. For each simulation suite, we indicate the average $\chi^2$ value across all galaxy catalogs in the test set. We also list the $\chi^2$ values after removing outliers, which are selected as catalogs whose predictions exhibit $\chi^2 > 10$ and present the percentage of outliers (percentage of catalogs removed after this selection).
Figure 5: Truth - Inference of $\Omega_{\rm m}$ -- Line-of-sight distance uncertainties. Considering only the $x$ and $y$ positions and $v_z$ velocity of the galaxies. We present the predictions for galaxy catalogs from Astrid, SIMBA, IllustrisTNG, SB28, Magneticum, and SWIFT-EAGLE. For each simulation suite, we indicate the average $\chi^2$ value across all galaxy catalogs in the test set. We also list the $\chi^2$ values after removing outliers, which are selected as catalogs whose predictions exhibit $\chi^2 > 10$ and present the percentage of outliers (percentage of catalogs removed after this selection).
...and 7 more figures

Field-level simulation-based inference with galaxy catalogs: the impact of systematic effects

TL;DR

Abstract

Field-level simulation-based inference with galaxy catalogs: the impact of systematic effects

Authors

TL;DR

Abstract

Table of Contents

Figures (12)