When the City Teaches the Car: Label-Free 3D Perception from Infrastructure

Zhen Xu; Jinsu Yoo; Cristian Bautista; Zanming Huang; Tai-Yu Pan; Zhenzhen Liu; Katie Z Luo; Mark Campbell; Bharath Hariharan; Wei-Lun Chao

When the City Teaches the Car: Label-Free 3D Perception from Infrastructure

Zhen Xu, Jinsu Yoo, Cristian Bautista, Zanming Huang, Tai-Yu Pan, Zhenzhen Liu, Katie Z Luo, Mark Campbell, Bharath Hariharan, Wei-Lun Chao

Abstract

Building robust 3D perception for self-driving still relies heavily on large-scale data collection and manual annotation, yet this paradigm becomes impractical as deployment expands across diverse cities and regions. Meanwhile, modern cities are increasingly instrumented with roadside units (RSUs), static sensors deployed along roads and at intersections to monitor traffic. This raises a natural question: can the city itself help train the vehicle? We propose infrastructure-taught, label-free 3D perception, a paradigm in which RSUs act as stationary, unsupervised teachers for ego vehicles. Leveraging their fixed viewpoints and repeated observations, RSUs learn local 3D detectors from unlabeled data and broadcast predictions to passing vehicles, which are aggregated as pseudo-label supervision for training a standalone ego detector. The resulting model requires no infrastructure or communication at test time. We instantiate this idea as a fully label-free three-stage pipeline and conduct a concept-and-feasibility study in a CARLA-based multi-agent environment. With CenterPoint, our pipeline achieves 82.3% AP for detecting vehicles, compared to a fully supervised ego upper bound of 94.4%. We further systematically analyze each stage, evaluate its scalability, and demonstrate complementarity with existing ego-centric label-free methods. Together, these results suggest that city infrastructure itself can potentially provide a scalable supervisory signal for autonomous vehicles, positioning infrastructure-taught learning as a promising orthogonal paradigm for reducing annotation cost in 3D perception.

When the City Teaches the Car: Label-Free 3D Perception from Infrastructure

Abstract

Paper Structure (30 sections, 20 figures, 11 tables, 1 algorithm)

This paper contains 30 sections, 20 figures, 11 tables, 1 algorithm.

Introduction
Related Work
Infrastructure-Taught, Label-Free 3D Perception
Problem Setup
Overall Pipeline
CIVET Dataset
Experiments
Data Collection
Experimental Setups
Main Results and Analysis (Qualitative Results in Suppl. Sec. D.3)
Further Analysis and Insights
Conclusion and Discussion
Disclosure of LLM Usage
Additional Details on CIVET
Data Curation
...and 15 more sections

Figures (20)

Figure 1: Can city infrastructure teach vehicles to perceive? We explore a new paradigm where roadside infrastructure acts as distributed teachers, providing supervision to train ego perception models without manual annotations.
Figure 2: Overview of infrastructure-taught, label-free 3D perception. Stage 1: each RSU learns a location-specialized detector in an unsupervised manner by exploiting temporal consistency from its stationary viewpoint. Stage 2: trained RSUs broadcast their predicted 3D bounding boxes to nearby ego vehicles when their fields of view overlap. Stage 3: the ego vehicle aggregates these predictions as pseudo-labels to train its own detector offline, producing a standalone ego model that no longer requires infrastructure at deployment time.
Figure 3: Sample from the CIVET dataset used in Stage 2 RSU-to-ego broadcasting. The ego vehicle and RSU observe the same traffic scene from different viewpoints with overlapping fields of view.
Figure 4: Effectiveness of PP scores for RSU. (a) Discriminative distribution allows a clear separation between static background and objects. (b) Pseudo-labels exhibit high localization quality.
Figure 5: Effect of tracking refinement zhang2023oyster. Incorporating tracking improves pseudo-label recall, yielding stronger supervision for unsupervised RSU training.
...and 15 more figures

When the City Teaches the Car: Label-Free 3D Perception from Infrastructure

Abstract

When the City Teaches the Car: Label-Free 3D Perception from Infrastructure

Authors

Abstract

Table of Contents

Figures (20)