Table of Contents
Fetching ...

MAFE R-CNN: Selecting More Samples to Learn Category-aware Features for Small Object Detection

Yichen Li, Qiankun Liu, Zhenchao Jin, Jiuzhe Wei, Jing Nie, Ying Fu

TL;DR

This work tackles the persistent challenge of small object detection by identifying imbalanced positive samples and blurred discriminative features as core bottlenecks. It introduces MAFE R-CNN, which combines Multi-Clue Sample Selection (MCSS) for balanced, high-quality positives with a Category-aware Feature Enhancement Mechanism (CFEM) that leverages a memory of category features and cross-attention to enrich small-object representations. MCSS integrates IoU-distance, predicted category confidence, and ground-truth region sizes with a dynamic threshold to maintain sample diversity across sizes, while CFEM maintains and updates a category-aware memory and fuses it with candidate-box features to improve both classification and regression. Experiments on the SODA dataset show state-of-the-art AP/AR for small objects, validating that jointly refining sample quality and feature representations yields meaningful gains in challenging small-object scenarios. The approach offers practical benefits for real-world detection tasks with dense, small-scale objects, and provides a scalable framework for handling imbalanced data without heavy loss terms or bespoke architectures.

Abstract

Small object detection in intricate environments has consistently represented a major challenge in the field of object detection. In this paper, we identify that this difficulty stems from the detectors' inability to effectively learn discriminative features for objects of small size, compounded by the complexity of selecting high-quality small object samples during training, which motivates the proposal of the Multi-Clue Assignment and Feature Enhancement R-CNN.Specifically, MAFE R-CNN integrates two pivotal components.The first is the Multi-Clue Sample Selection (MCSS) strategy, in which the Intersection over Union (IoU) distance, predicted category confidence, and ground truth region sizes are leveraged as informative clues in the sample selection process. This methodology facilitates the selection of diverse positive samples and ensures a balanced distribution of object sizes during training, thereby promoting effective model learning.The second is the Category-aware Feature Enhancement Mechanism (CFEM), where we propose a simple yet effective category-aware memory module to explore the relationships among object features. Subsequently, we enhance the object feature representation by facilitating the interaction between category-aware features and candidate box features.Comprehensive experiments conducted on the large-scale small object dataset SODA validate the effectiveness of the proposed method. The code will be made publicly available.

MAFE R-CNN: Selecting More Samples to Learn Category-aware Features for Small Object Detection

TL;DR

This work tackles the persistent challenge of small object detection by identifying imbalanced positive samples and blurred discriminative features as core bottlenecks. It introduces MAFE R-CNN, which combines Multi-Clue Sample Selection (MCSS) for balanced, high-quality positives with a Category-aware Feature Enhancement Mechanism (CFEM) that leverages a memory of category features and cross-attention to enrich small-object representations. MCSS integrates IoU-distance, predicted category confidence, and ground-truth region sizes with a dynamic threshold to maintain sample diversity across sizes, while CFEM maintains and updates a category-aware memory and fuses it with candidate-box features to improve both classification and regression. Experiments on the SODA dataset show state-of-the-art AP/AR for small objects, validating that jointly refining sample quality and feature representations yields meaningful gains in challenging small-object scenarios. The approach offers practical benefits for real-world detection tasks with dense, small-scale objects, and provides a scalable framework for handling imbalanced data without heavy loss terms or bespoke architectures.

Abstract

Small object detection in intricate environments has consistently represented a major challenge in the field of object detection. In this paper, we identify that this difficulty stems from the detectors' inability to effectively learn discriminative features for objects of small size, compounded by the complexity of selecting high-quality small object samples during training, which motivates the proposal of the Multi-Clue Assignment and Feature Enhancement R-CNN.Specifically, MAFE R-CNN integrates two pivotal components.The first is the Multi-Clue Sample Selection (MCSS) strategy, in which the Intersection over Union (IoU) distance, predicted category confidence, and ground truth region sizes are leveraged as informative clues in the sample selection process. This methodology facilitates the selection of diverse positive samples and ensures a balanced distribution of object sizes during training, thereby promoting effective model learning.The second is the Category-aware Feature Enhancement Mechanism (CFEM), where we propose a simple yet effective category-aware memory module to explore the relationships among object features. Subsequently, we enhance the object feature representation by facilitating the interaction between category-aware features and candidate box features.Comprehensive experiments conducted on the large-scale small object dataset SODA validate the effectiveness of the proposed method. The code will be made publicly available.

Paper Structure

This paper contains 29 sections, 11 equations, 5 figures, 9 tables, 1 algorithm.

Figures (5)

  • Figure 1: (a) The average number of samples assigned by different assigners for objects of various sizes, where absolute size corresponds to the square root of the object area. (b) Feature map visualization. Brighter colors indicate the higher attention of the model to the region. Traditional methods exhibit low attention to small objects, making it difficult for the model to acquire small object samples and high-quality small object features. The proposed MAFE R-CNN shows a more balanced sample allocation and higher-quality small object features, contributing to more effective small object prediction.
  • Figure 2: Illustration of the proposed MAFE R-CNN. Our method integrates Multi-Clue Sample Selection (MCSS) and Category-aware Feature Enhancement Mechanism (CFEM) into the MAFE R-CNN for training. During inference, MCSS is not used for sample selection, and the category-aware memory remains fixed. The three-stage prediction process based on the input features of small objects produces the final prediction results.
  • Figure 3: The illustration of CFEM training pipeline. Ground truth features update the category-aware memory, which is used for feature enhancement.
  • Figure 4: Visualization on the SODA test set. Only small objects with a confidence score larger than 0.3 are shown. Best viewed in color and zoom-in windows.
  • Figure 5: Comparison of proposal feature distributions before (a) and after (b) feature enhancement. Different colors represent different categories.