Table of Contents
Fetching ...

Bag of Freebies for Training Object Detection Neural Networks

Zhi Zhang, Tong He, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li

TL;DR

This work tackles the variability of training pipelines in object detection by proposing a Bag of Freebies (BoF)—a set of training-time tweaks that improve accuracy without changing model architectures or inference cost. It introduces a visually coherent mixup for detection, label smoothing for classification heads, and pragmatic choices in data preprocessing, LR scheduling, synchronized BatchNorm, and random-shape training, then demonstrates consistent gains on Pascal VOC and MS COCO for both single-stage and multi-stage detectors. The results show additive improvements across techniques, with YOLOv3 and Faster-RCNN achieving up to several percentage points in mean AP, validating the approach as a practical, deployment-friendly enhancement. The authors provide open-source implementations in GluonCV to facilitate adoption and replication in real-world pipelines.

Abstract

Training heuristics greatly improve various image classification model accuracies~\cite{he2018bag}. Object detection models, however, have more complex neural network structures and optimization targets. The training strategies and pipelines dramatically vary among different models. In this works, we explore training tweaks that apply to various models including Faster R-CNN and YOLOv3. These tweaks do not change the model architectures, therefore, the inference costs remain the same. Our empirical results demonstrate that, however, these freebies can improve up to 5% absolute precision compared to state-of-the-art baselines.

Bag of Freebies for Training Object Detection Neural Networks

TL;DR

This work tackles the variability of training pipelines in object detection by proposing a Bag of Freebies (BoF)—a set of training-time tweaks that improve accuracy without changing model architectures or inference cost. It introduces a visually coherent mixup for detection, label smoothing for classification heads, and pragmatic choices in data preprocessing, LR scheduling, synchronized BatchNorm, and random-shape training, then demonstrates consistent gains on Pascal VOC and MS COCO for both single-stage and multi-stage detectors. The results show additive improvements across techniques, with YOLOv3 and Faster-RCNN achieving up to several percentage points in mean AP, validating the approach as a practical, deployment-friendly enhancement. The authors provide open-source implementations in GluonCV to facilitate adoption and replication in real-world pipelines.

Abstract

Training heuristics greatly improve various image classification model accuracies~\cite{he2018bag}. Object detection models, however, have more complex neural network structures and optimization targets. The training strategies and pipelines dramatically vary among different models. In this works, we explore training tweaks that apply to various models including Faster R-CNN and YOLOv3. These tweaks do not change the model architectures, therefore, the inference costs remain the same. Our empirical results demonstrate that, however, these freebies can improve up to 5% absolute precision compared to state-of-the-art baselines.

Paper Structure

This paper contains 16 sections, 3 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: The Bag of Freebies improves object detector performances. There is no extra inference cost since models are not changed.
  • Figure 2: Mixup visualization of image classification with typical mixup ratio at $0.1:0.9$. Two images are mixed uniformly across all pixels, and image labels are weighted summation of original one-hot label vector.
  • Figure 3: Geometry preserved alignment of mixed images for object detection. Image pixels are mixed up, object labels are merged as a new array.
  • Figure 4: Comparison of different random weighted mixup sampling distributions. Red curve $\mathbf{B}(0.2,0.2)$ indicate the typical mixup ratio used in image classification. Blue curve is the special case $\mathbf{B}(1,1)$, equivalent to uniform distribution. Orange curve represents our choice $\mathbf{B}(1.5,1.5)$ for object detection after preliminary experiments.
  • Figure 5: Elephant in the room example. Model trained with geometry preserved mixup (bottom) is more robust against alien objects compared to baseline (top).
  • ...and 3 more figures