Table of Contents
Fetching ...

Simple In-place Data Augmentation for Surveillance Object Detection

Munkh-Erdene Otgonbold, Ganzorig Batnasan, Munkhjargal Gochoo

TL;DR

This paper tackles data-efficiency for surveillance object detection under stationary cameras by introducing an in-place data augmentation method that pastes object instances into original positions within the same frame. By increasing object counts rather than the number of images and enforcing non-overlap, the approach preserves realistic viewpoints while boosting diversity, and includes an Assembled variant that combines top-performing per-class augmentations. Evaluations on FishEye8K and UA-DETRAC show that, with only 8.5% of the full training data, the method can match or approach full-dataset performance, with notable gains in $mAP@.5$ (e.g., from $0.4798$ to $0.5025$ and from $0.29$ to $0.3138$ on FishEye8K) and strong results on UA-DETRAC under blur conditions (up to $mAP@.5$ of $0.7149$). The work highlights a practical, low-overhead augmentation strategy for traffic surveillance, while noting limitations to stationary-camera settings and suggesting future segmentation-based augmentation to further reduce potential label leakage.

Abstract

Motivated by the need to improve model performance in traffic monitoring tasks with limited labeled samples, we propose a straightforward augmentation technique tailored for object detection datasets, specifically designed for stationary camera-based applications. Our approach focuses on placing objects in the same positions as the originals to ensure its effectiveness. By applying in-place augmentation on objects from the same camera input image, we address the challenge of overlapping with original and previously selected objects. Through extensive testing on two traffic monitoring datasets, we illustrate the efficacy of our augmentation strategy in improving model performance, particularly in scenarios with limited labeled samples and imbalanced class distributions. Notably, our method achieves comparable performance to models trained on the entire dataset while utilizing only 8.5 percent of the original data. Moreover, we report significant improvements, with mAP@.5 increasing from 0.4798 to 0.5025, and the mAP@.5:.95 rising from 0.29 to 0.3138 on the FishEye8K dataset. These results highlight the potential of our augmentation approach in enhancing object detection models for traffic monitoring applications.

Simple In-place Data Augmentation for Surveillance Object Detection

TL;DR

This paper tackles data-efficiency for surveillance object detection under stationary cameras by introducing an in-place data augmentation method that pastes object instances into original positions within the same frame. By increasing object counts rather than the number of images and enforcing non-overlap, the approach preserves realistic viewpoints while boosting diversity, and includes an Assembled variant that combines top-performing per-class augmentations. Evaluations on FishEye8K and UA-DETRAC show that, with only 8.5% of the full training data, the method can match or approach full-dataset performance, with notable gains in (e.g., from to and from to on FishEye8K) and strong results on UA-DETRAC under blur conditions (up to of ). The work highlights a practical, low-overhead augmentation strategy for traffic surveillance, while noting limitations to stationary-camera settings and suggesting future segmentation-based augmentation to further reduce potential label leakage.

Abstract

Motivated by the need to improve model performance in traffic monitoring tasks with limited labeled samples, we propose a straightforward augmentation technique tailored for object detection datasets, specifically designed for stationary camera-based applications. Our approach focuses on placing objects in the same positions as the originals to ensure its effectiveness. By applying in-place augmentation on objects from the same camera input image, we address the challenge of overlapping with original and previously selected objects. Through extensive testing on two traffic monitoring datasets, we illustrate the efficacy of our augmentation strategy in improving model performance, particularly in scenarios with limited labeled samples and imbalanced class distributions. Notably, our method achieves comparable performance to models trained on the entire dataset while utilizing only 8.5 percent of the original data. Moreover, we report significant improvements, with mAP@.5 increasing from 0.4798 to 0.5025, and the mAP@.5:.95 rising from 0.29 to 0.3138 on the FishEye8K dataset. These results highlight the potential of our augmentation approach in enhancing object detection models for traffic monitoring applications.
Paper Structure (9 sections, 6 figures, 15 tables)

This paper contains 9 sections, 6 figures, 15 tables.

Figures (6)

  • Figure 1: The comparison between the original sample images and augmented images of FishEye8K and UA-DETRAC datasets. Both augmented samples include a comparatively larger number of objects due to the in-place augmentation. In contrast, the UA-DETRAC sample has blurred areas, which are the regions of non-interest determined by subtracting the polygonal area of the bounding boxes of all the objects.
  • Figure 2: In-place object augmentation method on Fisheye8k dataset. An augmented output sample image has multiple objects that appear on the other frames of the same surveillance camera video.
  • Figure 3: Number of objects of (a) Full dataset and (b) Sampled small dataset that is 8.5 percent of the full dataset.
  • Figure 4: Determining the region of interest, the polygonal area drawn in red, from all object labels in the specific camera video.
  • Figure 5: The comparison between original image and augmented image in Fisheye8K dataset.
  • ...and 1 more figures