Table of Contents
Fetching ...

pop-cosmos: Redshifts and physical properties of KiDS-1000 galaxies

Anik Halder, Hiranya V. Peiris, Stephen Thorp, Boris Leistedt, Daniel J. Mortlock, Gurjeet Jagwani, Madalina N. Tudorache, Sinan Deger, Benedict Van den Bussche, Joel Leja, Angus H. Wright

TL;DR

This work tackles scalable, principled Bayesian inference of galaxy redshifts and physical properties for wide-area photometric surveys by deploying the pop-cosmos prior, a diffusion-based model trained on COSMOS2020, in combination with the Speculator SPS emulator. It achieves full posterior inference for ~4 million KiDS-1000 galaxies, validated against DESI DR1 spectroscopic redshifts with low bias ($\sim 3\times10^{-3}$ to $3\times10^{-2}$) and small $\sigma_{\mathrm{MAD}}$ (0.04–0.05) while controlling outliers to a few percent. The analysis recovers established scaling relations (star-forming main sequence, mass–metallicity relation) and reveals dusty star-forming contaminants in LRG selections, illustrating both the power and limitations of colour-based classifications. The study demonstrates a scalable, out-of-sample-generalizable framework that enables physically defined weak-lensing samples and principled galaxy-evolution analyses for current and upcoming surveys with unprecedented depth and scale, including $z$ up to $6$ and millions of galaxies, with throughput of about $6.5$ GPU-seconds per galaxy and $\sim$7,500 GPU-hours for the KiDS-1000-scale dataset.

Abstract

Principled Bayesian inference of galaxy properties has not previously been performed for wide-area weak lensing surveys with millions of sources. We address this gap by applying the pop-cosmos generative model to perform spectral energy distribution (SED) fitting for 4 million KiDS-1000 galaxies. Calibrated on deep COSMOS2020 photometric data, pop-cosmos specifies a physically-motivated prior over the galaxy population up to $z \simeq 6$ in stellar population synthesis (SPS) parameter space. Using the Speculator SPS emulator with GPU-accelerated MCMC sampling, we perform full posterior inference at 6.5 GPU seconds per galaxy, obtaining joint constraints on galaxy redshifts and physical properties. We validate photometric redshifts against $\sim\!185,\!000$ KiDS galaxies cross-matched to DESI DR1 spectroscopic samples, achieving low bias ($3\times10^{-3}$), scatter ($σ_{\mathrm{MAD}}=0.04$), and outlier fraction (3.7%) for the Bright Galaxy Survey, with comparable performance (bias $3\times10^{-2}$, $σ_{\mathrm{MAD}}=0.05$, 1.3% outliers) for luminous red galaxies (LRGs). Within the LRG sample, we identify massive, dusty, star-forming contaminants at $z \simeq 0.4$ satisfying standard colour selections for quenched populations. We infer trends in stellar mass, star formation, metallicity, and dust across five tomographic redshift bins consistent with established scaling relations. Using specific star formation rate constraints, we identify $\sim$10% of KiDS-1000 galaxies as quenched, versus 37% implied by conservative colour cuts. This enables the construction of weak lensing samples defined by physical properties while mitigating intrinsic alignment systematics and preserving statistical power. Our analysis validates pop-cosmos out-of-sample, establishing it as a scaleable approach for galaxy evolution and cosmological analyses in photometric surveys.

pop-cosmos: Redshifts and physical properties of KiDS-1000 galaxies

TL;DR

This work tackles scalable, principled Bayesian inference of galaxy redshifts and physical properties for wide-area photometric surveys by deploying the pop-cosmos prior, a diffusion-based model trained on COSMOS2020, in combination with the Speculator SPS emulator. It achieves full posterior inference for ~4 million KiDS-1000 galaxies, validated against DESI DR1 spectroscopic redshifts with low bias ( to ) and small (0.04–0.05) while controlling outliers to a few percent. The analysis recovers established scaling relations (star-forming main sequence, mass–metallicity relation) and reveals dusty star-forming contaminants in LRG selections, illustrating both the power and limitations of colour-based classifications. The study demonstrates a scalable, out-of-sample-generalizable framework that enables physically defined weak-lensing samples and principled galaxy-evolution analyses for current and upcoming surveys with unprecedented depth and scale, including up to and millions of galaxies, with throughput of about GPU-seconds per galaxy and 7,500 GPU-hours for the KiDS-1000-scale dataset.

Abstract

Principled Bayesian inference of galaxy properties has not previously been performed for wide-area weak lensing surveys with millions of sources. We address this gap by applying the pop-cosmos generative model to perform spectral energy distribution (SED) fitting for 4 million KiDS-1000 galaxies. Calibrated on deep COSMOS2020 photometric data, pop-cosmos specifies a physically-motivated prior over the galaxy population up to in stellar population synthesis (SPS) parameter space. Using the Speculator SPS emulator with GPU-accelerated MCMC sampling, we perform full posterior inference at 6.5 GPU seconds per galaxy, obtaining joint constraints on galaxy redshifts and physical properties. We validate photometric redshifts against KiDS galaxies cross-matched to DESI DR1 spectroscopic samples, achieving low bias (), scatter (), and outlier fraction (3.7%) for the Bright Galaxy Survey, with comparable performance (bias , , 1.3% outliers) for luminous red galaxies (LRGs). Within the LRG sample, we identify massive, dusty, star-forming contaminants at satisfying standard colour selections for quenched populations. We infer trends in stellar mass, star formation, metallicity, and dust across five tomographic redshift bins consistent with established scaling relations. Using specific star formation rate constraints, we identify 10% of KiDS-1000 galaxies as quenched, versus 37% implied by conservative colour cuts. This enables the construction of weak lensing samples defined by physical properties while mitigating intrinsic alignment systematics and preserving statistical power. Our analysis validates pop-cosmos out-of-sample, establishing it as a scaleable approach for galaxy evolution and cosmological analyses in photometric surveys.
Paper Structure (24 sections, 7 equations, 17 figures, 2 tables)

This paper contains 24 sections, 7 equations, 17 figures, 2 tables.

Figures (17)

  • Figure 1: Upper panel: The 9 OmegaCAM and VISTA passbands used in KiDS (scaled to peak at 1.0), together with 3 DECam and 2 WISE filters (scaled to peak at 0.6) used by the DESI Legacy Imaging Survey, whose photometry is used for the inference analyses in this work. Lower panel: The 26 COSMOS passbands used in the training data for the pop-cosmos generative population model alsing24thorp25b. Broadbands are scaled to 1.0, intermediate bands (labeled 'IA/B...') are scaled to 0.33, and narrow bands ('NB...') are scaled to 0.2.
  • Figure 2: Distributions of redshifts: spectroscopic redshift $z_{\mathrm{spec}}$ of the KiDS-1000 cosmic shear galaxies cross-matched to the DESI BGS (blue) and LRG (red) samples; and the redshifts of pop-cosmos model galaxies (grey) with the KiDS-1000 selection from B. Leistedt et al. (BL26).
  • Figure 3: Spectroscopic vs. photometric redshifts for KiDS-1000 galaxies cross-matched to DESI DR1 BGS galaxies (first and second rows), and to DESI DR1 LRG galaxies (third and fourth rows), under the KiDS photometry (first and third rows) and DECaLS + WISE photometry (second and fourth rows). Photometric redshifts and other galaxy properties are the inferred posterior medians under the pop-cosmos prior. The threshold of $|\Delta_z|>0.15$ is indicated in the panels with dashed lines (grey), where $\Delta_z=(z-z_\mathrm{spec})/(1+z_\mathrm{spec})$. The shading of bins in each column corresponds to a different quantity. First column: Galaxy count. Second column: Width of the 68% posterior credible interval on $z_{\mathrm{phot}}$. Third column: Median specific star-formation rate (sSFR). Fourth column: Median diffuse dust optical depth ($\tau_2$). Summary photometric redshift metrics can be found in Table \ref{['tab:photoz_metrics']}.
  • Figure 4: Inferred posterior distributions and results for a galaxy (with the smallest absolute error in photometric redshift relative to spectroscopic redshift) from our KiDS-1000 $\times$ DESI BGS crossmatch, conditioned on KiDS photometry. Bottom left: 2-dimensional and 1-dimensional marginalised posterior distributions (blue contours) over a subset of SPS parameters. Grey contours show the pop-cosmos prior with the KiDS-1000 selection imposed (from L26). Top right: Blue curve shows the pointwise median and 68% credible interval on the galaxy's SED. Fluxes are in nanomaggies. Black points indicate the KiDS-1000 photometry, with horizontal bars depicting the FWHM of the KiDS passbands.
  • Figure 5: Same as Fig. \ref{['fig:Galaxy_index_49594_DESI_DR1_BGS_MCMC_SED_popcosmos']} but for the galaxy from our KiDS-1000 $\times$ DESI LRG cross-match with the smallest absolute redshift error relative to spectroscopic redshift. Inference is conditioned on KiDS photometry, with red contours and curves showing the posterior. Grey contours are as in Fig. \ref{['fig:Galaxy_index_49594_DESI_DR1_BGS_MCMC_SED_popcosmos']}.
  • ...and 12 more figures