Table of Contents
Fetching ...

From Galaxy Zoo DECaLS to BASS/MzLS: detailed galaxy morphology classification with unsupervised domain adaption

Renhao Ye, Shiyin Shen, Rafael S. de Souza, Quanfeng Xu, Mi Chen, Zhu Chen, Emille E. O. Ishida, Alberto Krone-Martins, Rupesh Durgesh

TL;DR

This work tackles cross-survey bias in detailed galaxy morphology classification by transferring a DECaLS-trained model (with GZD-5 labels) to the BMz survey via unsupervised domain adaptation. The authors implement a two-step approach: train a source-domain model on DECaLS using a Dirichlet-multinomial formulation for 34 morphology features, then fine-tune on BMz unlabeled data using spherical K-means pseudo-labels and a DA loss, keeping the source classifier fixed. The resulting BMz target-domain model delivers substantial performance gains over direct transfer, yielding metrics close to the DECaLS baseline and demonstrating consistency with external labels (Walmsley 2023). They release a comprehensive morphology catalogue for 248,088 BMz galaxies, including Dirichlet parameters, predicted probabilities, and uncertainties, offering practical usage guidance and enabling scalable morphology labeling for upcoming surveys (CSST, Euclid, LSST). Overall, the paper provides a robust, label-free strategy for migrating galaxy morphology classifications across surveys within the same physical domain, while noting limitations for extending to different astrophysical regimes.

Abstract

The DESI Legacy Imaging Surveys (DESI-LIS) comprise three distinct surveys: the Dark Energy Camera Legacy Survey (DECaLS), the Beijing-Arizona Sky Survey (BASS), and the Mayall z-band Legacy Survey (MzLS). The citizen science project Galaxy Zoo DECaLS 5 (GZD-5) has provided extensive and detailed morphology labels for a sample of 253,287 galaxies within the DECaLS survey. This dataset has been foundational for numerous deep learning-based galaxy morphology classification studies. However, due to differences in signal-to-noise ratios and resolutions between the DECaLS images and those from BASS and MzLS (collectively referred to as BMz), a neural network trained on DECaLS images cannot be directly applied to BMz images due to distributional mismatch. In this study, we explore an unsupervised domain adaptation (UDA) method that fine-tunes a source domain model trained on DECaLS images with GZD-5 labels to BMz images, aiming to reduce bias in galaxy morphology classification within the BMz survey. Our source domain model, used as a starting point for UDA, achieves performance on the DECaLS galaxies' validation set comparable to the results of related works. For BMz galaxies, the fine-tuned target domain model significantly improves performance compared to the direct application of the source domain model, reaching a level comparable to that of the source domain. We also release a catalogue of detailed morphology classifications for 248,088 galaxies within the BMz survey, accompanied by usage recommendations.

From Galaxy Zoo DECaLS to BASS/MzLS: detailed galaxy morphology classification with unsupervised domain adaption

TL;DR

This work tackles cross-survey bias in detailed galaxy morphology classification by transferring a DECaLS-trained model (with GZD-5 labels) to the BMz survey via unsupervised domain adaptation. The authors implement a two-step approach: train a source-domain model on DECaLS using a Dirichlet-multinomial formulation for 34 morphology features, then fine-tune on BMz unlabeled data using spherical K-means pseudo-labels and a DA loss, keeping the source classifier fixed. The resulting BMz target-domain model delivers substantial performance gains over direct transfer, yielding metrics close to the DECaLS baseline and demonstrating consistency with external labels (Walmsley 2023). They release a comprehensive morphology catalogue for 248,088 BMz galaxies, including Dirichlet parameters, predicted probabilities, and uncertainties, offering practical usage guidance and enabling scalable morphology labeling for upcoming surveys (CSST, Euclid, LSST). Overall, the paper provides a robust, label-free strategy for migrating galaxy morphology classifications across surveys within the same physical domain, while noting limitations for extending to different astrophysical regimes.

Abstract

The DESI Legacy Imaging Surveys (DESI-LIS) comprise three distinct surveys: the Dark Energy Camera Legacy Survey (DECaLS), the Beijing-Arizona Sky Survey (BASS), and the Mayall z-band Legacy Survey (MzLS). The citizen science project Galaxy Zoo DECaLS 5 (GZD-5) has provided extensive and detailed morphology labels for a sample of 253,287 galaxies within the DECaLS survey. This dataset has been foundational for numerous deep learning-based galaxy morphology classification studies. However, due to differences in signal-to-noise ratios and resolutions between the DECaLS images and those from BASS and MzLS (collectively referred to as BMz), a neural network trained on DECaLS images cannot be directly applied to BMz images due to distributional mismatch. In this study, we explore an unsupervised domain adaptation (UDA) method that fine-tunes a source domain model trained on DECaLS images with GZD-5 labels to BMz images, aiming to reduce bias in galaxy morphology classification within the BMz survey. Our source domain model, used as a starting point for UDA, achieves performance on the DECaLS galaxies' validation set comparable to the results of related works. For BMz galaxies, the fine-tuned target domain model significantly improves performance compared to the direct application of the source domain model, reaching a level comparable to that of the source domain. We also release a catalogue of detailed morphology classifications for 248,088 galaxies within the BMz survey, accompanied by usage recommendations.

Paper Structure

This paper contains 17 sections, 4 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Composite images (consisting of $grz$ bands) of a randomly selected spiral galaxy in DECaLS (left) and BMz (right). Both images are being processed using the same arcsinh stretching method as the DESI Legacy Survey Viewer.
  • Figure 2: The schematic diagram of target domain training including cutout input $x_\text{s}$ from source domain, $x_\text{t}$ from target domain, the feature extractor $f_\text{s}(\cdot)$ and $f_\text{t}(\cdot)$, and the classifier $W_\text{s}$. Spherical K-means are used to obtain pseudo-labels. The triangle represents the feature embedding not assigned with a pseudo-label, and the galaxy-like shape represents the feature embedding of assigned morphology. Red colour represents the source domain feature embedding and blue represents the target domain.
  • Figure 3: Expected probability $\hat{\rho}_q^{m_q}$ of the Dirichlet distribution of the model output visualised by probability simplex for the question 'Bar', where the three vertices represent the corresponding three features, namely 'Weak Bar', 'No Bar', and 'Strong Bar'. The scatter point is 1% sampling from BMz galaxies. Each data point within a triangle represents the expected probability combinations of the features. To read the probability of a feature, draw a line parallel to its opposing side, and the intersection at the right side indicates the probability (bottom edge: 'Strong Bar', top right edge: 'Weak Bar', top left edge: 'No Bar'). The left, middle, and right panels show the cases for 'the source model on DECaLS galaxies', 'the source model on BMz galaxies' and 'the target model on BMz galaxies', respectively. In each panel, the black contours show the number density distributions of the data points. In the middle and right panel, the red dashed contours are copies of the result of the source domain (left panel).
  • Figure 4: Examples of BMz galaxies are shown with 'Strong Bar' (left), 'Weak Bar' (middle), and 'No Bar' features (right), respectively. All galaxies are selected with $\hat{\rho}_\text{bar}^{m_\text{bar}} > 0.5$. The galaxies in the top row have lower variance (the top 15% in $\sigma^2$), while those in the bottom row have higher variance (the bottom 15% in $\sigma^2$).
  • Figure 5: Example of BMz galaxies selected with 'Strong Bar' features. The top row of galaxies are following a decision tree: $\hat{\rho}_\text{smooth or featured}^\text{featured or disc}$, $\hat{\rho}_\text{edge on}^\text{not edge on}$, $\hat{\rho}_\text{bar}^\text{strong bar}$ are larger than other features (the top 30% $\sigma^2$), and while the galaxies at bottom are selected, simply selected with $\hat{\rho}_\text{bar}^\text{strong bar}$ is larger than other features (the top 30% $\sigma^2$).
  • ...and 1 more figures