Class Imbalance in Object Detection: An Experimental Diagnosis and Study of Mitigation Strategies
Nieves Crasto
TL;DR
This paper addresses foreground-foreground class imbalance in single-stage object detection, focusing on YOLOv5s and edge-deployed scenarios. It introduces COCO-ZIPF, a 10-class long-tailed subset of COCO, and a PyTorch-based benchmarking framework to evaluate imbalance mitigation strategies, specifically sampling, loss weighting, and augmentation. The study finds that sampling and loss reweighting offer limited or negative benefits for YOLOv5 on COCO-ZIPF, while mosaic and mixup augmentations consistently improve mean Average Precision ($\text{mAP}$), with mosaic+mixup providing the strongest gains. The work provides practical guidance for handling class imbalance in lightweight detectors and releases code to support reproducibility and further research.
Abstract
Object detection, a pivotal task in computer vision, is frequently hindered by dataset imbalances, particularly the under-explored issue of foreground-foreground class imbalance. This lack of attention to foreground-foreground class imbalance becomes even more pronounced in the context of single-stage detectors. This study introduces a benchmarking framework utilizing the YOLOv5 single-stage detector to address the problem of foreground-foreground class imbalance. We crafted a novel 10-class long-tailed dataset from the COCO dataset, termed COCO-ZIPF, tailored to reflect common real-world detection scenarios with a limited number of object classes. Against this backdrop, we scrutinized three established techniques: sampling, loss weighing, and data augmentation. Our comparative analysis reveals that sampling and loss reweighing methods, while shown to be beneficial in two-stage detector settings, do not translate as effectively in improving YOLOv5's performance on the COCO-ZIPF dataset. On the other hand, data augmentation methods, specifically mosaic and mixup, significantly enhance the model's mean Average Precision (mAP), by introducing more variability and complexity into the training data. (Code available: https://github.com/craston/object_detection_cib)
