Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene

Tai-Yu Pan; Sooyoung Jeon; Mengdi Fan; Jinsu Yoo; Zhenyang Feng; Mark Campbell; Kilian Q. Weinberger; Bharath Hariharan; Wei-Lun Chao

Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene

Tai-Yu Pan, Sooyoung Jeon, Mengdi Fan, Jinsu Yoo, Zhenyang Feng, Mark Campbell, Kilian Q. Weinberger, Bharath Hariharan, Wei-Lun Chao

TL;DR

The paper tackles occlusion and sensing limits in ego-centric autonomous driving by proposing Transfer Your Perspective (TYP), a conditional diffusion framework to synthesize reference-view LiDAR conditioned on ego data. It employs a two-stage training regime: first learning $P(x|y)$ from real ego-centric data with semantic conditioning, then grounding generation to a reference viewpoint with an adapter to realize $P(x_r|x_e,y_r)$, aided by domain-adaptation to bridge simulated and real domains. The authors demonstrate that generated reference data can substitute real collaborative data, enabling scalable CAV development through datasets like ColWaymo and effective pre-training for collaborative perception backbones across real and semi-synthetic domains. This approach promises substantial reductions in data collection effort while expanding the scope and robustness of multi-agent perception systems, with demonstrated gains in both synthetic and real-world contexts.

Abstract

Self-driving cars relying solely on ego-centric perception face limitations in sensing, often failing to detect occluded, faraway objects. Collaborative autonomous driving (CAV) seems like a promising direction, but collecting data for development is non-trivial. It requires placing multiple sensor-equipped agents in a real-world driving scene, simultaneously! As such, existing datasets are limited in locations and agents. We introduce a novel surrogate to the rescue, which is to generate realistic perception from different viewpoints in a driving scene, conditioned on a real-world sample - the ego-car's sensory data. This surrogate has huge potential: it could potentially turn any ego-car dataset into a collaborative driving one to scale up the development of CAV. We present the very first solution, using a combination of simulated collaborative data and real ego-car data. Our method, Transfer Your Perspective (TYP), learns a conditioned diffusion model whose output samples are not only realistic but also consistent in both semantics and layouts with the given ego-car data. Empirical results demonstrate TYP's effectiveness in aiding in a CAV setting. In particular, TYP enables us to (pre-)train collaborative perception algorithms like early and late fusion with little or no real-world collaborative data, greatly facilitating downstream CAV applications.

Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene

TL;DR

Abstract

Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)