Exploration of Class Center for Fine-Grained Visual Classification

Hang Yao; Qiguang Miao; Peipei Zhao; Chaoneng Li; Xin Li; Guanwen Feng; Ruyi Liu

Exploration of Class Center for Fine-Grained Visual Classification

Hang Yao, Qiguang Miao, Peipei Zhao, Chaoneng Li, Xin Li, Guanwen Feng, Ruyi Liu

TL;DR

This work tackles fine-grained visual classification by addressing both intra-class variance and subtle inter-class differences under limited data. It introduces Exploration of Class Center (ECC), a loss that combines a Multiple Class-Center Constraint (MCC) on features with a Class-Center Label Generation (CLG) on distributions, yielding $L_{final}=L_{CE}+\\lambda_1 L_{MCC}+\\lambda_2 L_{CLG}$. MCC pulls samples toward their target class centers while pushing them away from the most similar non-target centers using cosine-based distances weighted by class similarities, and CLG derives soft labels from class-center distributions via KL divergence, mitigating overfitting. Extensive experiments on AIR, CUB, CAR, NAB (and iNat2018) demonstrate consistent improvements and state-of-the-art performance on FGVC-Aircraft and CUB-200-2011, with negligible training overhead and strong compatibility with existing FGVC methods.”

Abstract

Different from large-scale classification tasks, fine-grained visual classification is a challenging task due to two critical problems: 1) evident intra-class variances and subtle inter-class differences, and 2) overfitting owing to fewer training samples in datasets. Most existing methods extract key features to reduce intra-class variances, but pay no attention to subtle inter-class differences in fine-grained visual classification. To address this issue, we propose a loss function named exploration of class center, which consists of a multiple class-center constraint and a class-center label generation. This loss function fully utilizes the information of the class center from the perspective of features and labels. From the feature perspective, the multiple class-center constraint pulls samples closer to the target class center, and pushes samples away from the most similar nontarget class center. Thus, the constraint reduces intra-class variances and enlarges inter-class differences. From the label perspective, the class-center label generation utilizes classcenter distributions to generate soft labels to alleviate overfitting. Our method can be easily integrated with existing fine-grained visual classification approaches as a loss function, to further boost excellent performance with only slight training costs. Extensive experiments are conducted to demonstrate consistent improvements achieved by our method on four widely-used fine-grained visual classification datasets. In particular, our method achieves state-of-the-art performance on the FGVC-Aircraft and CUB-200-2011 datasets.

Exploration of Class Center for Fine-Grained Visual Classification

TL;DR

. MCC pulls samples toward their target class centers while pushing them away from the most similar non-target centers using cosine-based distances weighted by class similarities, and CLG derives soft labels from class-center distributions via KL divergence, mitigating overfitting. Extensive experiments on AIR, CUB, CAR, NAB (and iNat2018) demonstrate consistent improvements and state-of-the-art performance on FGVC-Aircraft and CUB-200-2011, with negligible training overhead and strong compatibility with existing FGVC methods.”

Abstract

Paper Structure (22 sections, 9 equations, 9 figures, 9 tables)

This paper contains 22 sections, 9 equations, 9 figures, 9 tables.

Introduction
Related Work
Fine-grained Image Classification
Class Center
Soft Labels
Exploration of Class Center
Multiple Class-Center Constraint
Class-Center Label Generation
Exploration of Class Center
Experiments
Implementation details
Integration with existing FGVC methods and different backbones
Comparison with different loss functions
Comparison with SoTA methods
Ablation studies
...and 7 more sections

Figures (9)

Figure 1: The t-SNE results of (a) cross entropy loss, (b) contrastive loss, (c) center loss and (d) our method on 18 categories of warblers. The improvements in (b) contrastive loss and (c) center loss are limited by inter-class differences and intra-class variances, such as classes masked in boxes. Compared with other methods, (d) our method compresses samples of the same class into a compact cluster and significantly enlarges the margins between different clusters, especially for the classes masked in the boxes. Thus, our method effectively reduces intra-class variances and enlarges inter-class differences.
Figure 2: There are four examples of soft labels, which correspond to the images in the first column. The columns from left to right show (a) images, (b) images from similar categories of (a), (c)smooth labels of LS, (d) predictions of the trained model and (e) our soft labels from CLG. Columns (a) and (b) are visually similar samples but belong to different categories. In Column (c), LS assigns the same confidence to all nontarget classes. Such soft labels do not reflect relationships between classes. The confidence of nontarget classes should be positively related to the similarity between the target class and nontarget classes. Other methods utilize the predictions of trained models as soft labels. However, the predictions may be incorrect, as shown in Column (d). Some samples can easily be predicted as similar nontarget classes, whose samples are shown in Column (b). Different from smooth labels of LS and predictions of trained model, our labels in Column (e) reflect the similarity between classes and ensure correct labelling.
Figure 3: Overview of ECC. First, class-center features and class-center distributions are updated with sample features and sample distributions from the backbone with a counter. For intra-class variances, the MCC reduces the cosine distance between the sample feature and the target class-center feature. Moreover, for inter-class differences, the MCC enlarges the cosine distance between the sample feature and similar nontarget class-center feature which is determined by similarity matrix of class-center features. Moreover, class-center distributions are employed to generate soft labels with the softmax function. The KL divergence between soft labels and sample probability distributions is calculated as the CLG loss. The MCC and CLG are summed with hyperparameters $\lambda_1$ and $\lambda_2$ as ECC loss. Finally, the ECC loss is combined with the CE loss to supervise model.
Figure 4: The performances of different $\lambda_1$ for the MCC and $\lambda_2$ for the CLG. Baselines are represented by dashed grey lines. The blue curves correspond to the changes in the MCC weight $\lambda_1$. The red curves correspond to the changes in the CLG weight $\lambda_2$.
Figure 5: The visualizations of t-SNE on 18 species of visually similar warblers from the CUB dataset. The left image represents the result of the CE loss. The right image is the result of the MCC. Points with the same colour belong to one class.
...and 4 more figures

Exploration of Class Center for Fine-Grained Visual Classification

TL;DR

Abstract

Exploration of Class Center for Fine-Grained Visual Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (9)