DiffstarPop: A generative physical model of galaxy star formation history
Alex Alarcon, Andrew P. Hearin, Matthew R. Becker, Gillian Beltz-Mohrmann, Andrew Benson, Sachi Weerasooriya
TL;DR
This work introduces DiffstarPop, a differentiable forward model that links dark matter halo mass assembly histories to galaxy star formation histories through Diffmah and Diffstar, formalized as the population PDF $P(\theta_{\rm SFH}|\theta_{\rm MAH})$. Built in JAX, it enables gradient-based optimization and rapid generation of SFHs for large galaxy populations, and it is stress-tested against three representative simulations—UniverseMachine, IllustrisTNG, and Galacticus—demonstrating faithful reproduction of key PDFs such as $P(M_{\star}|M_{\rm h},z)$ and $P({\rm sSFR}|M_{\star},z)$ with typical KL divergences around $0.01$–$0.03$. The model combines a parametric MAH (Diffmah) with a parametric SFH (Diffstar) and represents the galaxy population with a two-component Gaussian mixture (quenched and main sequence), totaling 79 parameters whose means and covariances scale with halo mass and formation time. DiffstarPop achieves fast SFH generation (up to $10^{6}$ galaxies in 1.1 CPU-s or 0.03 GPU-s) and provides a physics-based, differentiable framework suitable for forward modeling, Bayesian inference, and generating synthetic catalogs for upcoming surveys; the authors have released the public code and outline future extensions to include ex-situ SFH, dust, and SED modeling within the Diffsky framework.
Abstract
We present DiffstarPop, a differentiable forward model of cosmological populations of galaxy star formation histories (SFH). In the model, individual galaxy SFH is parametrized by Diffstar, which has parameters $θ_{\rm SFH}$ that have a direct interpretation in terms of galaxy formation physics, such as star formation efficiency and quenching. DiffstarPop is a model for the statistical connection between $θ_{\rm SFH}$ and the mass assembly history (MAH) of dark matter halos. We have formulated DiffstarPop to have the minimal flexibility needed to accurately reproduce the statistical distributions of galaxy SFH predicted by a diverse range of simulations, including the IllustrisTNG hydrodynamical simulation, the Galacticus semi-analytic model, and the UniverseMachine semi-empirical model. Our publicly available code written in JAX includes Monte Carlo generators that supply statistical samples of galaxy assembly histories that mimic the populations seen in each simulation, and can generate SFHs for $10^6$ galaxies in 1.1 CPU-seconds, or 0.03 GPU-seconds. We conclude the paper with a discussion of applications of DiffstarPop, which we are using to generate catalogs of synthetic galaxies populating the merger trees in cosmological N-body simulations.
