Table of Contents
Fetching ...

pop-cosmos: Insights from generative modeling of a deep, infrared-selected galaxy population

Stephen Thorp, Hiranya V. Peiris, Gurjeet Jagwani, Sinan Deger, Justin Alsing, Boris Leistedt, Daniel J. Mortlock, Anik Halder, Joel Leja

TL;DR

The paper extends the forward-modeling framework pop-cosmos to infrared-selected galaxies up to $z\sim6$ by training a diffusion-based population prior over 16 SPS parameters on 26-band COSMOS2020 data. It couples a fast SPS emulator with a diffusion-based uncertainty model and a robust inference pipeline to produce realistic mock catalogs and precise object-level posteriors, including redshifts and stellar masses, for hundreds of thousands of galaxies. The authors validate the model against COSMOS photometry and spectroscopy, demonstrating accurate redshift recovery ($\text{median}\Delta z\approx-8\times10^{-4}$, $\sigma_{\text{MAD}}=0.0132$, outlier $\sim6.19\%$) and sensible scaling relations (mass function, SFS, mass–metallicity, dust, and AGN trends). They release software, two million-galaxy mock catalogs, and COSMOS-like posteriors to support forward-modeling for Stage IV surveys, enabling realistic, large-scale cosmological analyses with IR-selected populations.

Abstract

We present an extension of the pop-cosmos model for the evolving galaxy population up to redshift $z\sim6$. The model is trained on distributions of observed colors and magnitudes, from 26-band photometry of $\sim420,000$ galaxies in the COSMOS2020 catalog with Spitzer IRAC $\textit{Ch. 1}<26$. The generative model includes a flexible distribution over 16 stellar population synthesis (SPS) parameters, and a depth-dependent photometric uncertainty model, both represented using score-based diffusion models. We use the trained model to predict scaling relationships for the galaxy population, such as the stellar mass function, star-forming main sequence, and gas-phase and stellar metallicity vs. mass relations, demonstrating reasonable-to-excellent agreement with previously published results. We explore the connection between mid-infrared emission from active galactic nuclei (AGN) and star-formation rate, finding high AGN activity for galaxies above the star-forming main sequence at $1\lesssim z\lesssim 2$. Using the trained population model as a prior distribution, we perform inference of the redshifts and SPS parameters for 429,669 COSMOS2020 galaxies, including 39,588 with publicly available spectroscopic redshifts. The resulting redshift estimates exhibit minimal bias ($\text{median}[Δ_z]=-8\times10^{-4}$), scatter ($σ_\text{MAD}=0.0132$), and outlier fraction ($6.19\%$) for the full $0<z<6$ spectroscopic compilation. These results establish that pop-cosmos can achieve the accuracy and realism needed to forward-model modern wide--deep surveys for Stage IV cosmology. We publicly release pop-cosmos software, mock galaxy catalogs, and COSMOS2020 redshift and SPS parameter posteriors.

pop-cosmos: Insights from generative modeling of a deep, infrared-selected galaxy population

TL;DR

The paper extends the forward-modeling framework pop-cosmos to infrared-selected galaxies up to by training a diffusion-based population prior over 16 SPS parameters on 26-band COSMOS2020 data. It couples a fast SPS emulator with a diffusion-based uncertainty model and a robust inference pipeline to produce realistic mock catalogs and precise object-level posteriors, including redshifts and stellar masses, for hundreds of thousands of galaxies. The authors validate the model against COSMOS photometry and spectroscopy, demonstrating accurate redshift recovery (, , outlier ) and sensible scaling relations (mass function, SFS, mass–metallicity, dust, and AGN trends). They release software, two million-galaxy mock catalogs, and COSMOS-like posteriors to support forward-modeling for Stage IV surveys, enabling realistic, large-scale cosmological analyses with IR-selected populations.

Abstract

We present an extension of the pop-cosmos model for the evolving galaxy population up to redshift . The model is trained on distributions of observed colors and magnitudes, from 26-band photometry of galaxies in the COSMOS2020 catalog with Spitzer IRAC . The generative model includes a flexible distribution over 16 stellar population synthesis (SPS) parameters, and a depth-dependent photometric uncertainty model, both represented using score-based diffusion models. We use the trained model to predict scaling relationships for the galaxy population, such as the stellar mass function, star-forming main sequence, and gas-phase and stellar metallicity vs. mass relations, demonstrating reasonable-to-excellent agreement with previously published results. We explore the connection between mid-infrared emission from active galactic nuclei (AGN) and star-formation rate, finding high AGN activity for galaxies above the star-forming main sequence at . Using the trained population model as a prior distribution, we perform inference of the redshifts and SPS parameters for 429,669 COSMOS2020 galaxies, including 39,588 with publicly available spectroscopic redshifts. The resulting redshift estimates exhibit minimal bias (), scatter (), and outlier fraction () for the full spectroscopic compilation. These results establish that pop-cosmos can achieve the accuracy and realism needed to forward-model modern wide--deep surveys for Stage IV cosmology. We publicly release pop-cosmos software, mock galaxy catalogs, and COSMOS2020 redshift and SPS parameter posteriors.

Paper Structure

This paper contains 45 sections, 10 equations, 31 figures, 6 tables.

Figures (31)

  • Figure 1: Broadband magnitudes (logarithmic) predicted by the new pop-cosmos generative model, compared to COSMOS2020 weaver22 for $\textit{Ch.\,1}<26$.
  • Figure 3: Broadband colors (logarithmic, alsing24 band combinations) predicted by the new pop-cosmos generative model, compared to COSMOS2020 weaver22 for $\textit{Ch.\,1}<26$. Contours enclose the 68 and 95% highest density regions.
  • Figure 4: Flux uncertainty vs. (logarithmic) magnitude for the COSMOS broad bands, compared to the flux errors reported in the COSMOS2020 catalog weaver22 for $\textit{Ch.\,1}<26$. Note that the pop-cosmos uncertainty predictions are made conditional on the pop-cosmos model magnitudes. Contours enclose the 68 and 95% highest probability density regions.
  • Figure 5: Comparison of $z^\text{phot}$ vs. $z^\text{spec}$ based on $z^\text{phot}$ (posterior median $z$) inferred under the new pop-cosmos prior. Each galaxy is weighted by its spectroscopic confidence level from khostovan25. Solid and dashed lines show $z^\text{phot}=z^\text{spec}$ and $|\Delta_z|=0.15$, respectively.
  • Figure 6: Comparison of stellar mass estimates for the 423,262 COSMOS2020 galaxies with $\textit{Ch.\,1}<26$. The vertical axis shows the posterior median $\log_{10}(M/M_\odot)$ under the new pop-cosmos prior. The horizontal axis shows (left) the LePhare stellar mass estimates from weaver22, and (right) the posterior median under the Prospector prior.
  • ...and 26 more figures