Bag of Freebies for Training Object Detection Neural Networks
Zhi Zhang, Tong He, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li
TL;DR
This work tackles the variability of training pipelines in object detection by proposing a Bag of Freebies (BoF)—a set of training-time tweaks that improve accuracy without changing model architectures or inference cost. It introduces a visually coherent mixup for detection, label smoothing for classification heads, and pragmatic choices in data preprocessing, LR scheduling, synchronized BatchNorm, and random-shape training, then demonstrates consistent gains on Pascal VOC and MS COCO for both single-stage and multi-stage detectors. The results show additive improvements across techniques, with YOLOv3 and Faster-RCNN achieving up to several percentage points in mean AP, validating the approach as a practical, deployment-friendly enhancement. The authors provide open-source implementations in GluonCV to facilitate adoption and replication in real-world pipelines.
Abstract
Training heuristics greatly improve various image classification model accuracies~\cite{he2018bag}. Object detection models, however, have more complex neural network structures and optimization targets. The training strategies and pipelines dramatically vary among different models. In this works, we explore training tweaks that apply to various models including Faster R-CNN and YOLOv3. These tweaks do not change the model architectures, therefore, the inference costs remain the same. Our empirical results demonstrate that, however, these freebies can improve up to 5% absolute precision compared to state-of-the-art baselines.
