A robust morphological classification method for galaxies using dual-encoding contrastive learning and multi-clustering voting on JWST/NIRCam images
Xiaolei Yin, Guanwen Fang, Shiying Lu, Zesen Lin, Yao Dai, Chichun Zhou
TL;DR
This work tackles the challenge of scalable, accurate galaxy morphology classification in large JWST COSMOS-Web datasets. It advances a two-step framework (USmorph) by integrating CAE denoising, APCT rotational normalization, and a dual-encoder contrastive learning scheme using ConvNeXt and ViT, followed by PCA for dimensionality reduction. The unsupervised clustering (bagging across three algorithms) yields 17,326 reliably labeled galaxies, which train a GoogLeNet classifier that labels the remaining $\sim$28,850 galaxies with a final accuracy of about $94.6\%$. Morphological parameter analyses (parametric and nonparametric) validate the classifications, demonstrating consistent trends (e.g., $n$, $r_e$, $G$, $M_{20}$, $C$, $\Psi$, and MID metrics) across SPH, ETD, LTD, and IRR types, and underscoring the framework’s potential for upcoming large-sky surveys such as those from the Chinese Space Station Telescope.
Abstract
The two-step galaxy morphology classification framework {\tt USmorph} successfully combines unsupervised machine learning (UML) with supervised machine learning (SML) methods. To enhance the UML step, we employed a dual-encoder architecture (ConvNeXt and ViT) to effectively encode images, contrastive learning to accurately extract features, and principal component analysis to efficiently reduce dimensionality. Based on this improved framework, a sample of 46,176 galaxies at $0<z<4.2$, selected in the COSMOS-Web field, is classified into five types using the JWST near-infrared images: 33\% spherical (SPH), 25\% early-type disk (ETD), 25\% late-type disk (LTD), 7\% irregular (IRR), and 10\% unclassified (UNC) galaxies. We also performed parametric (S{é}rsic index, $n$,and effective radius, $r_{\rm e}$) and nonparametric measurements (Gini coefficient, $G$, the second-order moment of light, $M_{\rm 20}$, concentration, $C$, multiplicity, $Ψ$, and three other parameters from the MID statistics) for massive galaxies ($M_*>10^9 M_\odot$) to verify the validity of our galaxy morphological classification system. The analysis of morphological parameters is consistent with our classification system: SPH and ETD galaxies with higher $n$, $G$, and $C$ tend to be more bulge-dominated and more compact compared with other types of galaxies. This demonstrates the reliability of this classification system, which will be useful for a forthcoming large-sky survey from the Chinese Space Station Telescope.
