Dual-coding contrastive learning based on ConvNeXt and ViT models for morphological classification of galaxies in COSMOS-Web
Shiwei Zhu, Guanwen Fang, Chichun Zhou, Jie Song, Zesen Lin, Yao Dai, Xu Kong
TL;DR
This work tackles scalable galaxy morphology classification in the COSMOS-Web field under limited labeled data by integrating self-supervised contrastive learning with a dual-encoder (ConvNeXt and ViT) framework, CAE denoising, and APCT rotational augmentation. The method first extracts compact features via a dual-encoder contrastive loss, then applies Bagging clustering to label 32,922 galaxies, and finally trains GoogLeNet to classify the remaining 12,366 objects, achieving 73% UML labeling and 27% SML labeling. Validation against parametric (Sérsic $n$, $r_e$) and nonparametric (G, $M_{20}$, $C$, $\Psi$, MID) morphology measures demonstrates strong concordance with galaxy evolution trends. The resulting 45,288-galaxy catalog, with five morphological classes, provides a robust, scalable resource for current and future surveys, including CSST, enabling efficient morphology-driven studies at $0.5<z<6.0$.
Abstract
In our previous works, we proposed a machine learning framework named \texttt{USmorph} for efficiently classifying galaxy morphology. In this study, we propose a self-supervised method called contrastive learning to upgrade the unsupervised machine learning (UML) part of the \texttt{USmorph} framework, aiming to improve the efficiency of feature extraction in this step. The upgraded UML method primarily consists of the following three aspects. (1) We employ a Convolutional Autoencoder to denoise galaxy images and the Adaptive Polar Coordinate Transformation to enhance the model's rotational invariance. (2) A pre-trained dual-encoder convolutional neural network based on ConvNeXt and ViT is used to encode the image data, while contrastive learning is then applied to reduce the dimension of the features. (3) We adopt a Bagging-based clustering model to cluster galaxies with similar features into distinct groups. By carefully dividing the redshift bins, we apply this model to the rest-frame optical images of galaxies in the COSMOS-Web field within the redshift range of $0.5 < z < 6.0$. Compared to the previous algorithm, the improved UML method successfully classifies 73\% galaxies. Using the GoogleNet algorithm, we classify the morphology of the remaining 27\% galaxies. To validate the reliability of our updated algorithm, we compared our classification results with other galaxy morphological parameters and found a good consistency with galaxy evolution. Benefiting from its higher efficiency, this updated algorithm is well-suited for application in future China Space Station Telescope missions.
