An updated efficient galaxy morphology classification model based on ConvNeXt encoding with UMAP dimensionality reduction
Guanwen Fang, Shiwei Zhu, Jun Xu, Shiying Lu, Chichun Zhou, Yao Dai, Zesen Lin, Xu Kong
TL;DR
This work addresses the scalable, unsupervised classification of galaxy morphologies in large surveys by updating the USmorph framework with a pre-trained ConvNeXt encoder and UMAP dimensionality reduction. The dual-stage approach yields 20 algorithmic clusters that are visually refined into five physical morphologies, classifying $50{,}056$ galaxies (about $51\%$ of the COSMOS sample) with significantly reduced computational cost. Validation against external catalogs (Galaxy Zoo:Hubble) and extensive structural parameter analysis demonstrate that the method captures expected morphology–structure correlations and offers robust, transferable classifications suitable for future surveys like CSST. The framework reduces reliance on labeled data, improves efficiency for cross-survey analyses, and provides a high-quality training subset for supervised or semi-supervised extensions.
Abstract
We present an enhanced unsupervised machine learning (UML) module within our previous \texttt{USmorph} classification framework featuring two components: (1) hierarchical feature extraction via a pre-trained ConvNeXt convolutional neural network (CNN) with transfer learning, and (2) nonlinear manifold learning using Uniform Manifold Approximation and Projection (UMAP) for topology-aware dimensionality reduction. This dual-stage design enables efficient knowledge transfer from large-scale visual datasets while preserving morphological pattern geometry through UMAP's neighborhood preservation. We apply the upgraded UML on I-band images of 99,806 COSMOS galaxies at redshift $0.2<z<1.2$ (to ensure rest-frame optical morphology) with $I_{\mathrm{mag}}<25$. The predefined cluster number is optimized to 20 (reduced from 50 in the original framework), achieving significant computational savings. The 20 algorithmically identified clusters are merged into five physical morphology types. About 51\% of galaxies (50,056) were successfully classified. To assess classification effectiveness, we tested morphological parameters for massive galaxies with $M_{*}>10^{9}~M_{\odot}$. Our classification results align well with galaxy evolution theory. This improved algorithm significantly enhances galaxy morphology classification efficiency, making it suitable for large-scale sky surveys such as those planned with the China Space Station Telescope (CSST).
