Table of Contents
Fetching ...

Robust Roadside Perception: an Automated Data Synthesis Pipeline Minimizing Human Annotation

Rusheng Zhang, Depu Meng, Lance Bassett, Shengyin Shen, Zhengxia Zou, Henry X. Liu

TL;DR

The paper tackles data insufficiency in infrastructure-based roadside perception for cooperative driving by introducing an automated AR-based data synthesis pipeline paired with a GAN-based reality enhancer to generate photo-realistic, annotated roadside data. This synthetic data can be used to train or fine-tune detectors (e.g., YOLOX) to achieve robustness across diverse weather and lighting, and across deployment locations. The approach is validated at two Michigan sites (Mcity intersection and Ellsworth/St State St roundabout), where models trained on synthesized data outperform baselines and show notable gains in harsh conditions, with additional improvements when combined with real data. The work demonstrates a practical, scalable deployment strategy that minimizes human labeling while improving transferability and performance of roadside perception systems for autonomous driving.

Abstract

Recently, advancements in vehicle-to-infrastructure communication technologies have elevated the significance of infrastructure-based roadside perception systems for cooperative driving. This paper delves into one of its most pivotal challenges: data insufficiency. The lacking of high-quality labeled roadside sensor data with high diversity leads to low robustness, and low transfer-ability of current roadside perception systems. In this paper, a novel solution is proposed to address this problem that creates synthesized training data using Augmented Reality. A Generative Adversarial Network is then applied to enhance the reality further, that produces a photo-realistic synthesized dataset that is capable of training or fine-tuning a roadside perception detector which is robust to different weather and lighting conditions. Our approach was rigorously tested at two key intersections in Michigan, USA: the Mcity intersection and the State St./Ellsworth Rd roundabout. The Mcity intersection is located within the Mcity test field, a controlled testing environment. In contrast, the State St./Ellsworth Rd intersection is a bustling roundabout notorious for its high traffic flow and a significant number of accidents annually. Experimental results demonstrate that detectors trained solely on synthesized data exhibit commendable performance across all conditions. Furthermore, when integrated with labeled data, the synthesized data can notably bolster the performance of pre-existing detectors, especially in adverse conditions.

Robust Roadside Perception: an Automated Data Synthesis Pipeline Minimizing Human Annotation

TL;DR

The paper tackles data insufficiency in infrastructure-based roadside perception for cooperative driving by introducing an automated AR-based data synthesis pipeline paired with a GAN-based reality enhancer to generate photo-realistic, annotated roadside data. This synthetic data can be used to train or fine-tune detectors (e.g., YOLOX) to achieve robustness across diverse weather and lighting, and across deployment locations. The approach is validated at two Michigan sites (Mcity intersection and Ellsworth/St State St roundabout), where models trained on synthesized data outperform baselines and show notable gains in harsh conditions, with additional improvements when combined with real data. The work demonstrates a practical, scalable deployment strategy that minimizes human labeling while improving transferability and performance of roadside perception systems for autonomous driving.

Abstract

Recently, advancements in vehicle-to-infrastructure communication technologies have elevated the significance of infrastructure-based roadside perception systems for cooperative driving. This paper delves into one of its most pivotal challenges: data insufficiency. The lacking of high-quality labeled roadside sensor data with high diversity leads to low robustness, and low transfer-ability of current roadside perception systems. In this paper, a novel solution is proposed to address this problem that creates synthesized training data using Augmented Reality. A Generative Adversarial Network is then applied to enhance the reality further, that produces a photo-realistic synthesized dataset that is capable of training or fine-tuning a roadside perception detector which is robust to different weather and lighting conditions. Our approach was rigorously tested at two key intersections in Michigan, USA: the Mcity intersection and the State St./Ellsworth Rd roundabout. The Mcity intersection is located within the Mcity test field, a controlled testing environment. In contrast, the State St./Ellsworth Rd intersection is a bustling roundabout notorious for its high traffic flow and a significant number of accidents annually. Experimental results demonstrate that detectors trained solely on synthesized data exhibit commendable performance across all conditions. Furthermore, when integrated with labeled data, the synthesized data can notably bolster the performance of pre-existing detectors, especially in adverse conditions.
Paper Structure (26 sections, 1 equation, 9 figures, 5 tables)

This paper contains 26 sections, 1 equation, 9 figures, 5 tables.

Figures (9)

  • Figure 1: An illustrative figure that shows the issues current roadside perception systems face due to the data insufficiency.
  • Figure 2: Two sets of cameras are leveraged at two different locations in this paper. Four cameras are installed at an intersection in Mcity, facing four approaches respectively (Figure \ref{['fig:installation1']}) and another four cameras are installed at four corners of a two-lane roundabout (Figure \ref{['fig:installation2']}).
  • Figure 3: Data synthesizing pipeline to generate realistic data: an AR renderer renders 3D model onto the real background with traffic simulation data, a GAN-based reality enhancer is then applied to make the rendered vehicle photo-realistic.
  • Figure 4: Illustration figure for pose estimation.
  • Figure 5: Pose estimation landmark selection on one of the camera (westbound approach of the intersection).
  • ...and 4 more figures