Table of Contents
Fetching ...

Understanding Galaxy Morphology Evolution Through Cosmic Time via Redshift Conditioned Diffusion Models

Andrew Lizarraga, Eric Hanchen Jiang, Jacob Nowack, Yun Qi Li, Ying Nian Wu, Bernie Boscoe, Tuan Do

TL;DR

The paper tackles the challenge of linking galaxy morphology to cosmic time by learning the conditional distribution $p(X|z)$ of galaxy images conditioned on continuous redshift via a diffusion model. It introduces a continuously-redshift-conditioned DDPM with redshift perturbations during training, integrated into a U-Net with self-attention and sinusoidal time encoding, and evaluated on the Hyper Suprime-Cam dataset. The key findings show that the model reproduces redshift-dependent morphological trends (ellipticity, semi-major axis, Sérsic index, isophotal area) and that generated morphologies align with real data across redshifts, while redshift predictions remain correlational with conditioning, especially at well-sampled redshift ranges. This approach offers a practical route to redshift estimation from imaging and enables scalable simulation of galaxy populations across cosmic time, with potential for dynamic visualization of galaxy evolution in future work.

Abstract

Redshift measures the distance to galaxies and underlies our understanding of the origin of the Universe and galaxy evolution. Spectroscopic redshift is the gold-standard method for measuring redshift, but it requires about $1000$ times more telescope time than broad-band imaging. That extra cost limits sky coverage and sample size and puts large spectroscopic surveys out of reach. Photometric redshift methods rely on imaging in multiple color filters and template fitting, yet they ignore the wealth of information carried by galaxy shape and structure. We demonstrate that a diffusion model conditioned on continuous redshift learns this missing joint structure, reproduces known morphology-$z$ correlations. We verify on the HyperSuprime-Cam survey, that the model captures redshift-dependent trends in ellipticity, semi-major axis, Sérsic index, and isophotal area that these generated images correlate closely with true redshifts on test data. To our knowledge this is the first study to establish a direct link between galaxy morphology and redshift. Our approach offers a simple and effective path to redshift estimation from imaging data and will help unlock the full potential of upcoming wide-field surveys.

Understanding Galaxy Morphology Evolution Through Cosmic Time via Redshift Conditioned Diffusion Models

TL;DR

The paper tackles the challenge of linking galaxy morphology to cosmic time by learning the conditional distribution of galaxy images conditioned on continuous redshift via a diffusion model. It introduces a continuously-redshift-conditioned DDPM with redshift perturbations during training, integrated into a U-Net with self-attention and sinusoidal time encoding, and evaluated on the Hyper Suprime-Cam dataset. The key findings show that the model reproduces redshift-dependent morphological trends (ellipticity, semi-major axis, Sérsic index, isophotal area) and that generated morphologies align with real data across redshifts, while redshift predictions remain correlational with conditioning, especially at well-sampled redshift ranges. This approach offers a practical route to redshift estimation from imaging and enables scalable simulation of galaxy populations across cosmic time, with potential for dynamic visualization of galaxy evolution in future work.

Abstract

Redshift measures the distance to galaxies and underlies our understanding of the origin of the Universe and galaxy evolution. Spectroscopic redshift is the gold-standard method for measuring redshift, but it requires about times more telescope time than broad-band imaging. That extra cost limits sky coverage and sample size and puts large spectroscopic surveys out of reach. Photometric redshift methods rely on imaging in multiple color filters and template fitting, yet they ignore the wealth of information carried by galaxy shape and structure. We demonstrate that a diffusion model conditioned on continuous redshift learns this missing joint structure, reproduces known morphology- correlations. We verify on the HyperSuprime-Cam survey, that the model captures redshift-dependent trends in ellipticity, semi-major axis, Sérsic index, and isophotal area that these generated images correlate closely with true redshifts on test data. To our knowledge this is the first study to establish a direct link between galaxy morphology and redshift. Our approach offers a simple and effective path to redshift estimation from imaging data and will help unlock the full potential of upcoming wide-field surveys.

Paper Structure

This paper contains 14 sections, 2 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Source density versus $r$-band depth for major imaging and spectroscopic galaxy surveys. The $r$-band depth corresponds to the $5\sigma$ detection limit in a $2"$ aperture. Each point represents a survey, with completed imaging surveys (e.g., SDSS, DES, KiDS, HSC) shown in solid markers, and projected surveys (e.g., LSST, Euclid) shown in pink outlined squares. The y-axis shows the number of detected sources per square degree on a logarithmic scale. Among publicly available datasets, the Hyper Suprime-Cam (HSC) survey stands out as the deepest complete imaging survey, with a source density of $\sim 10^5$ galaxies per deg$^2$ and an $r$-band depth of $\sim 26.5$ AB magnitudes. These properties make HSC uniquely suited for training deep learning models that require high-resolution, redshift-labeled galaxy images.
  • Figure 2: Model Architecture: Our model follows conventional DDPM implementations, but the noise adjust conditioning of $z$ allows the model to better interpolate between it's conditioning for nearby neighborhoods of $z$.
  • Figure 3: From left to right, the figure displays histograms comparing the frequency distribution of DDPM-generated and real galaxies in terms of 1) ellipticity, 2) semi-major axis, 3) Sérsic index, and 4 isophotal area in the log-scale of redshift $z$.
  • Figure 4: Mean morphological metrics as a function of redshift. Comparison between real test galaxies (orange) and DDPM-generated galaxies (blue) across different redshift bins, with error bars representing $95\%$ confidence intervals on the mean. The model accurately reproduces the observed evolutionary trends in average ellipticity, semi-major axis, Sérsic index, and isophotal area with redshift.
  • Figure 5: Redshift prediction quality for synthesized galaxies. (Left) CNN-predicted redshift $\hat{z}$ versus true spectroscopic redshift $z$ for real test-set galaxies, demonstrating the baseline accuracy of the independent predictor. (Middle) CNN-predicted $\hat{z}$ versus the input conditioning redshift $z$ for DDPM-generated galaxies, showing strong correlation. (Right) Mean redshift loss, $|\hat{z}-z|/(1+z)$, as a function of true/conditioned $z$, confirming good performance in well-sampled regions (e.g., $z < 1.5-2.0$) and highlighting increased scatter at higher redshifts due to training data sparsity.
  • ...and 3 more figures