Table of Contents
Fetching ...

Category-based Galaxy Image Generation via Diffusion Models

Xingzhong Fan, Hongming Tang, Yue Zeng, M. B. N. Kouwenhoven, Guangquan Zeng

Abstract

Conventional galaxy generation methods rely on semi-analytical models and hydrodynamic simulations, which are highly dependent on physical assumptions and parameter tuning. In contrast, data-driven generative models do not have explicit physical parameters pre-determined, and instead learn them efficiently from observational data, making them alternative solutions to galaxy generation. Among these, diffusion models outperform Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) in quality and diversity. Leveraging physical prior knowledge to these models can further enhance their capabilities. In this work, we present GalCatDiff, the first framework in astronomy to leverage both galaxy image features and astrophysical properties in the network design of diffusion models. GalCatDiff incorporates an enhanced U-Net and a novel block entitled Astro-RAB (Residual Attention Block), which dynamically combines attention mechanisms with convolution operations to ensure global consistency and local feature fidelity. Moreover, GalCatDiff uses category embeddings for class-specific galaxy generation, avoiding the high computational costs of training separate models for each category. Our experimental results demonstrate that GalCatDiff significantly outperforms existing methods in terms of the consistency of sample color and size distributions, and the generated galaxies are both visually realistic and physically consistent. This framework will enhance the reliability of galaxy simulations and can potentially serve as a data augmentor to support future galaxy classification algorithm development.

Category-based Galaxy Image Generation via Diffusion Models

Abstract

Conventional galaxy generation methods rely on semi-analytical models and hydrodynamic simulations, which are highly dependent on physical assumptions and parameter tuning. In contrast, data-driven generative models do not have explicit physical parameters pre-determined, and instead learn them efficiently from observational data, making them alternative solutions to galaxy generation. Among these, diffusion models outperform Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) in quality and diversity. Leveraging physical prior knowledge to these models can further enhance their capabilities. In this work, we present GalCatDiff, the first framework in astronomy to leverage both galaxy image features and astrophysical properties in the network design of diffusion models. GalCatDiff incorporates an enhanced U-Net and a novel block entitled Astro-RAB (Residual Attention Block), which dynamically combines attention mechanisms with convolution operations to ensure global consistency and local feature fidelity. Moreover, GalCatDiff uses category embeddings for class-specific galaxy generation, avoiding the high computational costs of training separate models for each category. Our experimental results demonstrate that GalCatDiff significantly outperforms existing methods in terms of the consistency of sample color and size distributions, and the generated galaxies are both visually realistic and physically consistent. This framework will enhance the reliability of galaxy simulations and can potentially serve as a data augmentor to support future galaxy classification algorithm development.

Paper Structure

This paper contains 11 sections, 12 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: The schematic diagram of the diffusion model framework for category galaxy generation (GalCatDiff). In the forward process, a clean image $X_0$ is progressively corrupted by adding Gaussian noise at each timestep $t$, eventually producing a pure noise image $X_T$. The reverse process, guided by the enhanced U-Net estimating $P_\theta(X_{t-1} \mid X_t)$, starts from the noise image $X_T$ and iteratively removes noise to reconstruct the target image $X_0$. The enhanced U-Net architecture begins with a Single Convolution layer that transforms the input into a richer feature representation. The model consists of two down-sampling and up-sampling stages, each incorporating multiple Astro-RABs. During down-sampling, feature maps pass through each Residual Block, with skip connections established to retain key features and link them to corresponding layers in the up-sampling stages. Category and time information is integrated into the U-Net by embedding them and inputting them into each Astro-RAB. A schematic of the Astro-RABs is shown in the lower-left corner of this figure, and the full version is provided in Fig. \ref{['fig:resblock']}.
  • Figure 2: Overview of the Astro-RAB structure. Each block consists of a convolutional block followed by an Attention Fusion Unit and a skip connection, with this sequence iterated twice. The Attention Fusion Unit incorporates window attention, combined with convolutional layers, to preserve essential galaxy physics properties.
  • Figure 3: A total of 311 images were excluded from the dataset to improve the quality and category accuracy of the generated results. The images were removed due to factors such as low quality, contamination, or the lack of a dominant central galaxy. For example, (a) images with significant color contamination, (b) images exhibiting excessive noise, (c) the central galaxy’s brightness is distorted, (d) strong light flares in the lower-right corner, (e) multiple galaxies where the central galaxy is not prominent, and (f) the central galaxy is not the main focus and another galaxy in the lower-left corner dominates.
  • Figure 4: Redshift distribution of the training set and test set for each of the six morphological categories as well as the combined sample. Overall, the distributions are broadly consistent across training and test sets, with minor shifts observed in minority categories. For instance, the peak of the test set distribution for Edge-on with Bulge appears broader compared to that of the training set. Red: training data samples; Blue: testing data samples.
  • Figure 5: Angular span distribution of the training set and test set for each of the six morphological categories as well as the combined sample. Overall, the distributions are broadly consistent across training and test sets, with minor shifts observed in minority categories. For instance, the tail of the test set distribution for Edge-on without Bulge is relatively higher compared to that of the training set. Red: training data samples; Blue: testing data samples.
  • ...and 5 more figures