Novel Class Discovery for Ultra-Fine-Grained Visual Categorization
Yu Liu, Yaqi Cai, Qi Jia, Binglin Qiu, Weimin Wang, Nan Pu
TL;DR
This work addresses ultra-fine-grained visual categorization in a semi-supervised setting by proposing UFG-NCD, a novel task that discovers new categories from unlabeled data using partially labeled Ultra-FGVC data. It introduces Region-Aligned Proxy Learning (RAPL), which combines Channel-wise Region Alignment (CRA) for local discriminative features with a Semi-Supervised Proxy Learning (SemiPL) framework that leverages class proxies for proxy-guided supervised and contrastive learning. The approach yields state-of-the-art results across five SoyAgeing Ultra-FGVC datasets, demonstrating robust transfer of knowledge from labeled to unlabeled ultra-fine-grained classes and strong performance under both task-agnostic and task-aware protocols. The methods show clear contributions from modeling regional features and proxy-based distribution learning, with practical implications for scalable Ultra-FGVC in real-world domains such as precision agriculture.
Abstract
Ultra-fine-grained visual categorization (Ultra-FGVC) aims at distinguishing highly similar sub-categories within fine-grained objects, such as different soybean cultivars. Compared to traditional fine-grained visual categorization, Ultra-FGVC encounters more hurdles due to the small inter-class and large intra-class variation. Given these challenges, relying on human annotation for Ultra-FGVC is impractical. To this end, our work introduces a novel task termed Ultra-Fine-Grained Novel Class Discovery (UFG-NCD), which leverages partially annotated data to identify new categories of unlabeled images for Ultra-FGVC. To tackle this problem, we devise a Region-Aligned Proxy Learning (RAPL) framework, which comprises a Channel-wise Region Alignment (CRA) module and a Semi-Supervised Proxy Learning (SemiPL) strategy. The CRA module is designed to extract and utilize discriminative features from local regions, facilitating knowledge transfer from labeled to unlabeled classes. Furthermore, SemiPL strengthens representation learning and knowledge transfer with proxy-guided supervised learning and proxy-guided contrastive learning. Such techniques leverage class distribution information in the embedding space, improving the mining of subtle differences between labeled and unlabeled ultra-fine-grained classes. Extensive experiments demonstrate that RAPL significantly outperforms baselines across various datasets, indicating its effectiveness in handling the challenges of UFG-NCD. Code is available at https://github.com/SSDUT-Caiyq/UFG-NCD.
