PP-SSL : Priority-Perception Self-Supervised Learning for Fine-Grained Recognition

ShuaiHeng Li; Qing Cai; Fan Zhang; Menghuan Zhang; Yangyang Shu; Zhi Liu; Huafeng Li; Lingqiao Liu

PP-SSL : Priority-Perception Self-Supervised Learning for Fine-Grained Recognition

ShuaiHeng Li, Qing Cai, Fan Zhang, Menghuan Zhang, Yangyang Shu, Zhi Liu, Huafeng Li, Lingqiao Liu

TL;DR

PP-SSL tackles self-supervised FGVR's granularity gap by introducing AIS and IADM. AIS uses a fine-grained text corpus and CLIP-based knowledge distillation to filter irrelevant features, while IADM uses GradCAM from the original image to highlight subtle cues. The total loss combines contrastive learning with AIS and IADM, and inference uses only the image encoder for efficiency. Experiments on seven FGVR datasets show consistent retrieval and classification gains over state-of-the-art SSL methods, underscoring the practical value of the approach.

Abstract

Self-supervised learning is emerging in fine-grained visual recognition with promising results. However, existing self-supervised learning methods are often susceptible to irrelevant patterns in self-supervised tasks and lack the capability to represent the subtle differences inherent in fine-grained visual recognition (FGVR), resulting in generally poorer performance. To address this, we propose a novel Priority-Perception Self-Supervised Learning framework, denoted as PP-SSL, which can effectively filter out irrelevant feature interference and extract more subtle discriminative features throughout the training process. Specifically, it composes of two main parts: the Anti-Interference Strategy (AIS) and the Image-Aided Distinction Module (IADM). In AIS, a fine-grained textual description corpus is established, and a knowledge distillation strategy is devised to guide the model in eliminating irrelevant features while enhancing the learning of more discriminative and high-quality features. IADM reveals that extracting GradCAM from the original image effectively reveals subtle differences between fine-grained categories. Compared to features extracted from intermediate or output layers, the original image retains more detail, allowing for a deeper exploration of the subtle distinctions among fine-grained classes. Extensive experimental results indicate that the PP-SSL significantly outperforms existing methods across various datasets, highlighting its effectiveness in fine-grained recognition tasks. Our code will be made publicly available upon publication.

PP-SSL : Priority-Perception Self-Supervised Learning for Fine-Grained Recognition

TL;DR

Abstract

PP-SSL : Priority-Perception Self-Supervised Learning for Fine-Grained Recognition

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)