Exploring the Impact of Synthetic Data for Aerial-view Human Detection

Hyungtae Lee; Yan Zhang; Yi-Ting Shen; Heesung Kwon; Shuvra S. Bhattacharyya

Exploring the Impact of Synthetic Data for Aerial-view Human Detection

Hyungtae Lee, Yan Zhang, Yi-Ting Shen, Heesung Kwon, Shuvra S. Bhattacharyya

TL;DR

This work investigates how synthetic data can bolster aerial-view human detection by systematically analyzing three interacting factors: the real reference data used to measure domain gap, the synthetic data selected for training, and the synthetic data pool from which samples are drawn. The authors model the detector’s representation as a multivariate Gaussian in the feature space and define a distribution gap as a normalized Mahalanobis distance, enabling a quantitative link between domain discrepancy and post-training performance. They introduce Progressive Transformation Learning (PTL) to progressively augment training with synthetics while preserving sim2real quality, using a CycleGAN to adapt selected samples toward the current data distribution and a time-saving tuning-from-previous-iteration strategy. Across extensive experiments on five real aerial datasets and a large synthetic pool, they show that synthetic data can significantly improve learning and generalization, especially in data-scarce regimes, but that benefits depend on real-data availability, sim2real transformation quality, and the diversity and domain-gap characteristics of the synthetic pool. The study offers practical guidance for designing synthetic-data workflows to maximize learning gains and domain generalization in aerial perception tasks and beyond.

Abstract

Aerial-view human detection has a large demand for large-scale data to capture more diverse human appearances compared to ground-view human detection. Therefore, synthetic data can be a good resource to expand data, but the domain gap with real-world data is the biggest obstacle to its use in training. As a common solution to deal with the domain gap, the sim2real transformation is used, and its quality is affected by three factors: i) the real data serving as a reference when calculating the domain gap, ii) the synthetic data chosen to avoid the transformation quality degradation, and iii) the synthetic data pool from which the synthetic data is selected. In this paper, we investigate the impact of these factors on maximizing the effectiveness of synthetic data in training in terms of improving learning performance and acquiring domain generalization ability--two main benefits expected of using synthetic data. As an evaluation metric for the second benefit, we introduce a method for measuring the distribution gap between two datasets, which is derived as the normalized sum of the Mahalanobis distances of all test data. As a result, we have discovered several important findings that have never been investigated or have been used previously without accurate understanding. We expect that these findings can break the current trend of either naively using or being hesitant to use synthetic data in machine learning due to the lack of understanding, leading to more appropriate use in future research.

Exploring the Impact of Synthetic Data for Aerial-view Human Detection

TL;DR

Abstract

Paper Structure (19 sections, 11 equations, 6 figures, 7 tables)

This paper contains 19 sections, 11 equations, 6 figures, 7 tables.

Introduction
Related Works
Methodology
Measuring Distribution Gap
Leveraging Synthetic Images in Training
Experimental Settings
Results and Analysis
A Study on the Impact of Real Data
A Study on the Impact of Synthetic Data
A Study on the Impact of the Synthetic Data Pool
Discussions
Preliminaries
Modeling Representation Space of Sigmoid-based Detector
Cross-entropy with Mixture of Delta Distributions and Multivariate Gaussian Distribution
Implementation Details
...and 4 more sections

Figures (6)

Figure 1: Sim2real transformation mechanism. Three datasets (real data, synthetic data, and the synthetic data pool) can influence the impact of synthetic data used in training.
Figure 2: Accuracy with the size of real dataset. (b) and (c) show the accuracy when synthetic images are used in training.
Figure 3: Change in distribution gap when synthetic images are used for each image of the Okutama-Action dataset under the Vis-50 setup.
Figure 4: Accuracy with the size of the synthetic dataset. Plots in the top and bottom rows show APs in same-domain and cross-domain tasks, respectively.
Figure 5: Detection accuracy-distribution gap scatter plot with various numbers of synthetic images. The left and right plots are made with PTL and a random selection, respectively, using the Okutama-Action dataset under the Vis-50 setup. Darker dots represent test data when using more synthetic images for training.
...and 1 more figures

Exploring the Impact of Synthetic Data for Aerial-view Human Detection

TL;DR

Abstract

Exploring the Impact of Synthetic Data for Aerial-view Human Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (6)