Table of Contents
Fetching ...

Using Galaxy Evolution as Source of Physics-Based Ground Truth for Generative Models

Yun Qi Li, Tuan Do, Evan Jones, Bernie Boscoe, Kevin Alfaro, Zooey Nguyen

TL;DR

This work treats galaxy evolution as physics-based ground truth to evaluate generative image models. It develops two conditional generative architectures, a denoising diffusion model (DDPM) and a variational autoencoder (CVAE), conditioned on redshift $z$, and introduces physics-inspired metrics (galaxy KL loss, galaxy-fitting loss, redshift loss) alongside standard IS/FID to quantify realism. Across a $z$-ranging galaxy dataset from Hyper Suprime-Cam, the DDPM generally outperforms the CVAE on physics-based metrics, especially at higher redshifts, though neither model reliably recovers the conditioned redshift or fully captures low-redshift diversity. The study demonstrates that physics-grounded evaluation can reveal strengths and limitations of generative models beyond human perceptual judgments, guiding future improvements in physics-aware image generation.

Abstract

Generative models producing images have enormous potential to advance discoveries across scientific fields and require metrics capable of quantifying the high dimensional output. We propose that astrophysics data, such as galaxy images, can test generative models with additional physics-motivated ground truths in addition to human judgment. For example, galaxies in the Universe form and change over billions of years, following physical laws and relationships that are both easy to characterize and difficult to encode in generative models. We build a conditional denoising diffusion probabilistic model (DDPM) and a conditional variational autoencoder (CVAE) and test their ability to generate realistic galaxies conditioned on their redshifts (galaxy ages). This is one of the first studies to probe these generative models using physically motivated metrics. We find that both models produce comparable realistic galaxies based on human evaluation, but our physics-based metrics are better able to discern the strengths and weaknesses of the generative models. Overall, the DDPM model performs better than the CVAE on the majority of the physics-based metrics. Ultimately, if we can show that generative models can learn the physics of galaxy evolution, they have the potential to unlock new astrophysical discoveries.

Using Galaxy Evolution as Source of Physics-Based Ground Truth for Generative Models

TL;DR

This work treats galaxy evolution as physics-based ground truth to evaluate generative image models. It develops two conditional generative architectures, a denoising diffusion model (DDPM) and a variational autoencoder (CVAE), conditioned on redshift , and introduces physics-inspired metrics (galaxy KL loss, galaxy-fitting loss, redshift loss) alongside standard IS/FID to quantify realism. Across a -ranging galaxy dataset from Hyper Suprime-Cam, the DDPM generally outperforms the CVAE on physics-based metrics, especially at higher redshifts, though neither model reliably recovers the conditioned redshift or fully captures low-redshift diversity. The study demonstrates that physics-grounded evaluation can reveal strengths and limitations of generative models beyond human perceptual judgments, guiding future improvements in physics-aware image generation.

Abstract

Generative models producing images have enormous potential to advance discoveries across scientific fields and require metrics capable of quantifying the high dimensional output. We propose that astrophysics data, such as galaxy images, can test generative models with additional physics-motivated ground truths in addition to human judgment. For example, galaxies in the Universe form and change over billions of years, following physical laws and relationships that are both easy to characterize and difficult to encode in generative models. We build a conditional denoising diffusion probabilistic model (DDPM) and a conditional variational autoencoder (CVAE) and test their ability to generate realistic galaxies conditioned on their redshifts (galaxy ages). This is one of the first studies to probe these generative models using physically motivated metrics. We find that both models produce comparable realistic galaxies based on human evaluation, but our physics-based metrics are better able to discern the strengths and weaknesses of the generative models. Overall, the DDPM model performs better than the CVAE on the majority of the physics-based metrics. Ultimately, if we can show that generative models can learn the physics of galaxy evolution, they have the potential to unlock new astrophysical discoveries.
Paper Structure (22 sections, 11 equations, 14 figures, 1 table)

This paper contains 22 sections, 11 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: Grid of images showing two example galaxies in our training dataset. The first two rows shows a low redshift galaxy (closer to Earth) while the last two rows show a higher redshift galaxy (farther from Earth). Each galaxy is observed using 5 optical filters represented has the first five columns. The false color image using these five filters is shown in the last column. We show the images both with linear and log scaling to show different features. Note that in general galaxies closer to Earth have larger sizes and more complex features while those further away are more compact in appearance.
  • Figure 2: Illustration of the three galaxy metrics that we measure from the images. Left: isophotal area. The highlighted pixels indicate area of the image that exceed the background threshold. Middle: Ellipticity. The ellipitical frame shows the ellipticity of the best fit Gaussian profile. Right: Sersic index. The contours show how brightness is distributed with relation to radius.
  • Figure 3: Examples of galaxy images that were generated by the DDPM (column 1 to 4) and the CVAE (columns 5 to 8) compared to real images (columns 9 to 10). Each row is at a different redshift. Each galaxy is shown with both linear and log image scaling. The galaxy images produced by the CVAE tend to have artifacts and correlated pixels in the background, which is clearly seen in log scale. The CVAE images of higher redshift ($>1$) galaxies also tend to show extended irregular bright artifacts around the central galaxy that is not in the real images.
  • Figure 4: Distribution of the three physical parameters (top: isophotal area, middle: ellipticity, bottom: Sersic index at two different redshift bins (left: $<0.5$, right: $1.0 < z < 1.5$) for the real images (blue), DDPM images (DDPM), and CVAE images (orange). The CVAE distributions matches the real data at redshifts $<0.5$, but the DDPM matches better at higher redshifts.
  • Figure 5: The log ratio of the KL Loss between the CVAE and the DDPM as a function of redshifts for isophoto area (blue), ellipticity (orange), and Sersic index (green). The CVAE performs better at the lowest redshift bin, but DDPM performs better over a larger range of redshifts.
  • ...and 9 more figures