Simple In-place Data Augmentation for Surveillance Object Detection
Munkh-Erdene Otgonbold, Ganzorig Batnasan, Munkhjargal Gochoo
TL;DR
This paper tackles data-efficiency for surveillance object detection under stationary cameras by introducing an in-place data augmentation method that pastes object instances into original positions within the same frame. By increasing object counts rather than the number of images and enforcing non-overlap, the approach preserves realistic viewpoints while boosting diversity, and includes an Assembled variant that combines top-performing per-class augmentations. Evaluations on FishEye8K and UA-DETRAC show that, with only 8.5% of the full training data, the method can match or approach full-dataset performance, with notable gains in $mAP@.5$ (e.g., from $0.4798$ to $0.5025$ and from $0.29$ to $0.3138$ on FishEye8K) and strong results on UA-DETRAC under blur conditions (up to $mAP@.5$ of $0.7149$). The work highlights a practical, low-overhead augmentation strategy for traffic surveillance, while noting limitations to stationary-camera settings and suggesting future segmentation-based augmentation to further reduce potential label leakage.
Abstract
Motivated by the need to improve model performance in traffic monitoring tasks with limited labeled samples, we propose a straightforward augmentation technique tailored for object detection datasets, specifically designed for stationary camera-based applications. Our approach focuses on placing objects in the same positions as the originals to ensure its effectiveness. By applying in-place augmentation on objects from the same camera input image, we address the challenge of overlapping with original and previously selected objects. Through extensive testing on two traffic monitoring datasets, we illustrate the efficacy of our augmentation strategy in improving model performance, particularly in scenarios with limited labeled samples and imbalanced class distributions. Notably, our method achieves comparable performance to models trained on the entire dataset while utilizing only 8.5 percent of the original data. Moreover, we report significant improvements, with mAP@.5 increasing from 0.4798 to 0.5025, and the mAP@.5:.95 rising from 0.29 to 0.3138 on the FishEye8K dataset. These results highlight the potential of our augmentation approach in enhancing object detection models for traffic monitoring applications.
