From Galaxy Zoo DECaLS to BASS/MzLS: detailed galaxy morphology classification with unsupervised domain adaption
Renhao Ye, Shiyin Shen, Rafael S. de Souza, Quanfeng Xu, Mi Chen, Zhu Chen, Emille E. O. Ishida, Alberto Krone-Martins, Rupesh Durgesh
TL;DR
This work tackles cross-survey bias in detailed galaxy morphology classification by transferring a DECaLS-trained model (with GZD-5 labels) to the BMz survey via unsupervised domain adaptation. The authors implement a two-step approach: train a source-domain model on DECaLS using a Dirichlet-multinomial formulation for 34 morphology features, then fine-tune on BMz unlabeled data using spherical K-means pseudo-labels and a DA loss, keeping the source classifier fixed. The resulting BMz target-domain model delivers substantial performance gains over direct transfer, yielding metrics close to the DECaLS baseline and demonstrating consistency with external labels (Walmsley 2023). They release a comprehensive morphology catalogue for 248,088 BMz galaxies, including Dirichlet parameters, predicted probabilities, and uncertainties, offering practical usage guidance and enabling scalable morphology labeling for upcoming surveys (CSST, Euclid, LSST). Overall, the paper provides a robust, label-free strategy for migrating galaxy morphology classifications across surveys within the same physical domain, while noting limitations for extending to different astrophysical regimes.
Abstract
The DESI Legacy Imaging Surveys (DESI-LIS) comprise three distinct surveys: the Dark Energy Camera Legacy Survey (DECaLS), the Beijing-Arizona Sky Survey (BASS), and the Mayall z-band Legacy Survey (MzLS). The citizen science project Galaxy Zoo DECaLS 5 (GZD-5) has provided extensive and detailed morphology labels for a sample of 253,287 galaxies within the DECaLS survey. This dataset has been foundational for numerous deep learning-based galaxy morphology classification studies. However, due to differences in signal-to-noise ratios and resolutions between the DECaLS images and those from BASS and MzLS (collectively referred to as BMz), a neural network trained on DECaLS images cannot be directly applied to BMz images due to distributional mismatch. In this study, we explore an unsupervised domain adaptation (UDA) method that fine-tunes a source domain model trained on DECaLS images with GZD-5 labels to BMz images, aiming to reduce bias in galaxy morphology classification within the BMz survey. Our source domain model, used as a starting point for UDA, achieves performance on the DECaLS galaxies' validation set comparable to the results of related works. For BMz galaxies, the fine-tuned target domain model significantly improves performance compared to the direct application of the source domain model, reaching a level comparable to that of the source domain. We also release a catalogue of detailed morphology classifications for 248,088 galaxies within the BMz survey, accompanied by usage recommendations.
