Robustness Analysis of USmorph: I. Generalization Efficiency of Unsupervised Strategies and Supervised Learning in Galaxy Morphological Classification
Shiwei Zhu, Guanwen Fang, Yao Dai, Chichun Zhou, Yirui Zheng, Jie Song, Shiying Lu, Xu Kong
TL;DR
This study provides a rigorous robustness analysis of the USmorph galaxy morphology framework, which integrates unsupervised feature extraction and clustering with supervised CNN classification. By systematically tuning the CAE (latent dimension $d=40$, $5\times5$ kernels), APCT for rotational invariance, and a bagging clustering set at $K=50$, the authors establish a stable pipeline whose labels are validated against low-dimensional structure (t-SNE) and physics-based parameter spaces. The supervised stage using GoogLeNet achieves $\sim$94% accuracy with consistent performance across data partitions, demonstrating reliable morphology classification suitable for upcoming surveys like CSST. Collectively, the work provides practical guidance on architectural choices and validation strategies to enable robust, scalable galaxy morphology analysis in large, unlabeled astronomical datasets.
Abstract
We conduct a systematic robustness analysis of the hybrid machine learning framework \texttt{USmorph}, which integrates unsupervised and supervised learning for galaxy morphological classification. Although \texttt{USmorph} has already been applied to nearly 100,000 $I$-band galaxy images in the COSMOS field ($0.2 < z < 1.2$, $I_{\mathrm{mag}} < 25$), the stability of its core modules has not been quantitatively assessed. Our tests show that the convolutional autoencoder (CAE) achieves the best performance in preserving structural information when adopting an intermediate network depth, $5\times5$ convolutional kernels, and a 40-dimensional latent representation. The adaptive polar coordinate transform (APCT) effectively enhances rotational invariance and improves the robustness of downstream tasks. In the unsupervised stage, a bagging clustering number of $K=50$ provides the optimal trade-off between classification granularity and labeling efficiency. For supervised learning, we employ GoogLeNet, which exhibits stable performance without overfitting. We validate the reliability of the final classifications through two independent tests: (1) the t-distributed stochastic neighbor embedding (t-SNE) visualization reveals clear clustering boundaries in the low-dimensional space; and (2) the morphological classifications are consistent with theoretical expectations of galaxy evolution, with both true and false positives showing unbiased distributions in the parameter space. These results demonstrate the strong robustness of the \texttt{USmorph} algorithm, providing guidance for its future application to the China Space Station Telescope (CSST) mission.
