Table of Contents
Fetching ...

One latent to fit them all: a unified representation of baryonic feedback on matter distribution

Shurui Lin, Yin Li, Shy Genel, Francisco Villaescusa-Navarro, Biwei Dai, Wentao Luo, Yang Wang

TL;DR

The paper addresses the challenge of modeling baryonic feedback on the matter distribution for cosmology by learning a unified, low-dimensional latent representation from multiple CAMELS hydrodynamic simulations. It introduces a conditional beta-TCVAE that encodes baryonic physics into a 2D latent space while making it largely independent of redshift and cosmology, using the transfer function $T^2(k,a) = P_{hyd}(k,a)/P_{dmo}(k,a)$ as the target. The two latent dimensions correlate with distinct feedback processes (BH growth and SN feedback), providing interpretable effects on the matter power spectrum and enabling a fast emulator for baryonic corrections across simulation suites. This framework supports cross-suite comparisons and potentially enables robust weak-lensing analyses for Stage-4 surveys, though it currently faces SIMBA-specific biases and would benefit from richer training data in future CAMELS releases.

Abstract

Accurate and parsimonious quantification of baryonic feedback on matter distribution is of crucial importance for understanding both cosmology and galaxy formation from observational data. This is, however, challenging given the large discrepancy among different models of galaxy formation simulations, and their distinct subgrid physics parameterizations. Using 5,072 simulations from 4 different models covering broad ranges in their parameter spaces, we find a unified 2D latent representation. Compared to the simulations and other phenomenological models, our representation is independent of both time and cosmology, much lower-dimensional, and disentangled in its impacts on the matter power spectra. The common latent space facilitates the comparison of parameter spaces of different models and is readily interpretable by correlation with each. The two latent dimensions provide a complementary representation of baryonic effects, linking black hole and supernova feedback to distinct and interpretable impacts on both the matter power spectrum, and field, level. Our approach enables developing robust and economical analytic models for optimal gain of physical information from data, and is generalizable to other fields with significant modeling uncertainty.

One latent to fit them all: a unified representation of baryonic feedback on matter distribution

TL;DR

The paper addresses the challenge of modeling baryonic feedback on the matter distribution for cosmology by learning a unified, low-dimensional latent representation from multiple CAMELS hydrodynamic simulations. It introduces a conditional beta-TCVAE that encodes baryonic physics into a 2D latent space while making it largely independent of redshift and cosmology, using the transfer function as the target. The two latent dimensions correlate with distinct feedback processes (BH growth and SN feedback), providing interpretable effects on the matter power spectrum and enabling a fast emulator for baryonic corrections across simulation suites. This framework supports cross-suite comparisons and potentially enables robust weak-lensing analyses for Stage-4 surveys, though it currently faces SIMBA-specific biases and would benefit from richer training data in future CAMELS releases.

Abstract

Accurate and parsimonious quantification of baryonic feedback on matter distribution is of crucial importance for understanding both cosmology and galaxy formation from observational data. This is, however, challenging given the large discrepancy among different models of galaxy formation simulations, and their distinct subgrid physics parameterizations. Using 5,072 simulations from 4 different models covering broad ranges in their parameter spaces, we find a unified 2D latent representation. Compared to the simulations and other phenomenological models, our representation is independent of both time and cosmology, much lower-dimensional, and disentangled in its impacts on the matter power spectra. The common latent space facilitates the comparison of parameter spaces of different models and is readily interpretable by correlation with each. The two latent dimensions provide a complementary representation of baryonic effects, linking black hole and supernova feedback to distinct and interpretable impacts on both the matter power spectrum, and field, level. Our approach enables developing robust and economical analytic models for optimal gain of physical information from data, and is generalizable to other fields with significant modeling uncertainty.

Paper Structure

This paper contains 26 sections, 19 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Top: Architecture of the conditional $\beta$-TCVAE. The encoder takes the baryonic feedback transfer function $\ln T^2$ and five cosmological parameters to infer the latent distribution and latent sample $\vz$. Latents and cosmology are passed to the decoder to reconstruct $\ln {T'}^2$. The architecture of the encoder and decoder can be found in \ref{['sec:en/de-coder']}. Mid left: Effect of latent dimensions on $T^2$ reconstruction at $z=2$, $1$, and $0$ in IllustrisTNG. Black points and error bars show CV-set means and variances. Solid lines show reconstruction at the latent mean, with shading for latent deviation. Mid right: Latent PDF contours for the TEA model on IllustrisTNG, indicating disentanglement with a smooth distribution nearly factorizable along the principal axes. Bottom: Pearson cross-correlation between latents and parameters of IllustrisTNG SB28 simulation, computed from mean latents and parameters in the validation set. Colors indicate correlation strength. Parameters are listed in \ref{['Table:TNG_paras']}.
  • Figure 2: Reconstruction test showing $\frac{\mathrm{Reconstruction\ MSE}}{\mathrm{Cosmic\ Variance}}$ of the transfer functions $T^2$ as a function of wavenumber, averaged over redshift. For each simulation in the CV set, 100 distinct reconstructions are created. The mean square error is calculated by averaging over the whole $27 \times 100 = 2700$ reconstructions. The redshift-averaged ratio was taken for each curve. Values near 1 indicate that reconstruction scatter is dominated by cosmic variance, with minimal additional model uncertainty. Subplots correspond to models trained on different datasets (\ref{['sec:training']}). Colored lines represent test sets: IllustrisTNG (blue), Astrid (orange), SIMBA (green), and EAGLE (red). Each model achieves high reconstruction accuracy on its training suite.
  • Figure 3: Heatmaps of simulation parameter distributions projected onto the latent space. For each simulation in the IllustrisTNG suite, we generate 20 latent samples and compute the mean value of each parameter within 2D hexagonal bins. Each panel shows the average value of a specific parameter across the latent space. From left to right, the first six panels display parameters with the strongest cross-correlations with the latent dimensions, all showing structured and consistent patterns, including detailed nonlinear features. In the right column, we present two representative parameters (a baryonic one and $\Omega_\mathrm{m}$) that exhibit negligible correlation with the latent space. Their heatmaps appear nearly uniform, consistent with their low cross-correlation values.
  • Figure 4: Upper: Heatmaps of the massive black hole fraction, $\log_{10}(f_\mathrm{massive\ BH})$ with $M_\mathrm{BH} \geq 10^8\,M_\odot$, in the latent space across $z=2$, $1$, and $0$ (right to left) for IllustrisTNG, similar to \ref{['fig:contour_TNG']}. At all redshifts, "Latent 0" correlates positively with BH mass, while "Latent 1" correlates negatively. Lower: Cross-correlation between latent variables and the massive black hole fraction across redshift for all four simulation suites. "Latent 0" shows positive correlations for IllustrisTNG and Astrid, increasing with redshift, but a negative trend for SIMBA. Negative correlations are shown with "Latent 1" for all suites.
  • Figure 5: Examples of three types of failure behavior observed in 3D latent space models. Top left: Incomplete reconstruction from a 3D model. The red line shows the mean power spectrum ratio from the CV set, while the blue dots indicate the corresponding reconstructed spectra ratio. The reconstruction significantly deviates from the truth for $k > 1\, h\,\mathrm{Mpc}^{-1}$. Top right: An example of an overly concentrated latent distribution. The sharply striped contour indicates that at least one of the latents has collapsed into a nearly delta-function-like distribution. Bottom: Cross-correlation between the three latent dimensions and simulation parameters similar to the bottom panel of \ref{['fig:AIO_plot']}. For visual clarity, the sign of "Latent 1" has been flipped in this specific example to highlight its similarity to "Latent 0".
  • ...and 7 more figures