Enhancing Fine-grained Image Classification through Attentive Batch Training

Duy M. Le; Bao Q. Bui; Anh Tran; Cong Tran; Cuong Pham

Enhancing Fine-grained Image Classification through Attentive Batch Training

Duy M. Le, Bao Q. Bui, Anh Tran, Cong Tran, Cuong Pham

TL;DR

This work tackles fine-grained image classification by introducing Relationship Batch Integration (RBI), a batch-aware framework that exploits inter-image relationships within a training batch. RBI combines a Relationship Position Encoding (RPE) module, which encodes pairwise image similarities based on normalized PSNR-derived metrics, with Residual Relationship Attention (RRA) to fuse batch features and preserve original representations via a residual pathway. Empirical results across multiple backbones and datasets (including CUB-200-2011, Stanford Dogs, and NABirds) show consistent accuracy gains, with state-of-the-art performance on Stanford Dogs and notable improvements on others, while enabling smaller backbones to outperform larger baselines in some configurations. The approach is presented as a versatile plug-in refinement that can be integrated with existing networks to boost fine-grained recognition without substantial computational overhead.

Abstract

Fine-grained image classification, which is a challenging task in computer vision, requires precise differentiation among visually similar object categories. In this paper, we propose 1) a novel module called Residual Relationship Attention (RRA) that leverages the relationships between images within each training batch to effectively integrate visual feature vectors of batch images and 2) a novel technique called Relationship Position Encoding (RPE), which encodes the positions of relationships between original images in a batch and effectively preserves the relationship information between images within the batch. Additionally, we design a novel framework, namely Relationship Batch Integration (RBI), which utilizes RRA in conjunction with RPE, allowing the discernment of vital visual features that may remain elusive when examining a singular image representative of a particular class. Through extensive experiments, our proposed method demonstrates significant improvements in the accuracy of different fine-grained classifiers, with an average increase of $(+2.78\%)$ and $(+3.83\%)$ on the CUB200-2011 and Stanford Dog datasets, respectively, while achieving a state-of-the-art results $(95.79\%)$ on the Stanford Dog dataset. Despite not achieving the same level of improvement as in fine-grained image classification, our method still demonstrates its prowess in leveraging general image classification by attaining a state-of-the-art result of $(93.71\%)$ on the Tiny-Imagenet dataset. Furthermore, our method serves as a plug-in refinement module and can be easily integrated into different networks.

Enhancing Fine-grained Image Classification through Attentive Batch Training

TL;DR

Abstract

and

on the CUB200-2011 and Stanford Dog datasets, respectively, while achieving a state-of-the-art results

on the Stanford Dog dataset. Despite not achieving the same level of improvement as in fine-grained image classification, our method still demonstrates its prowess in leveraging general image classification by attaining a state-of-the-art result of

on the Tiny-Imagenet dataset. Furthermore, our method serves as a plug-in refinement module and can be easily integrated into different networks.

Paper Structure (15 sections, 13 equations, 8 figures, 1 table)

This paper contains 15 sections, 13 equations, 8 figures, 1 table.

Introduction
Related work
Attention-based methods
Fine-grained image classification
Proposed Approach
Relationship Batch Integration (RBI) Framework
Relationship Position Encoding (RPE)
Residual Relationship Attention (RRA)
Experiments
Datasets and Experimental Settings
Comparison to Existing Methods
The Impact of Batch Configurations
Feature Extracted by Conventional DNN and RBI
RRA Similarity matrix
Conclusion

Figures (8)

Figure 1: Example of intra-batch feature fusion to enhance predictivity for target images.
Figure 2: Relationship Batch Integration (RBI) Framework
Figure 3: Performance comparison for RBIs using various batch sizes on both the Stanford Dogs dataset (on the left) and the CUB-200-2011 dataset (on the right). Note that experiments with large batch sizes on Densenet201-RBI, SwinT-Small-RBI, and ConvNeXtBase-RBI are omitted due to the GPU's memory constraints.
Figure 4: Comparison between features extracted by ConvNeXt-Large, ConvNeXt-Large-RBI, HERB-SwinT and HERB-SwinT-RBI on Stanford Dogs dataset, illustrated by GradCam.
Figure 5: The flow chart illustrates the GradCAM visualizations of features extracted by ConvNeXt-Large-RBI within a batch containing 8 images.
...and 3 more figures

Enhancing Fine-grained Image Classification through Attentive Batch Training

TL;DR

Abstract

Enhancing Fine-grained Image Classification through Attentive Batch Training

Authors

TL;DR

Abstract

Table of Contents

Figures (8)