Table of Contents
Fetching ...

Cooperative Infrastructure Perception

Fawad Ahmad, Christina Suyong Shin, Weiwu Pang, Branden Leong, Pradipta Ghosh, Ramesh Govindan

TL;DR

The paper addresses line-of-sight limitations in autonomous perception by introducing Cooperative Infrastructure Perception (CIP), which fuses outputs from multiple roadside LiDARs to produce 3D bounding boxes and tracks in real time. The design centers on a bounding-box abstraction and an accurate initial alignment method, augmented by fast, GPU-accelerated heading estimation and several system optimizations to meet a sub-100 ms latency on commodity edge hardware. Results from real-world traces and CarLA simulations show CIP achieves competitive accuracy with low latency and demonstrates substantial safety benefits when augmenting vehicle perception, as well as notable throughput gains when planning is offloaded to the edge. The work highlights a practical path toward real-time, infrastructure-supported perception that enables safe, traffic-efficient operation in complex environments and mixed reality applications.

Abstract

Recent works have considered two qualitatively different approaches to overcome line-of-sight limitations of 3D sensors used for perception: cooperative perception and infrastructure-augmented perception. In this paper, motivated by increasing deployments of infrastructure LiDARs, we explore a third approach, cooperative infrastructure perception. This approach generates perception outputs by fusing outputs of multiple infrastructure sensors, but, to be useful, must do so quickly and accurately. We describe the design, implementation and evaluation of Cooperative Infrastructure Perception (CIP), which uses a combination of novel algorithms and systems optimizations. It produces perception outputs within 100 ms using modest computing resources and with accuracy comparable to the state-of-the-art. CIP, when used to augment vehicle perception, can improve safety. When used in conjunction with offloaded planning, CIP can increase traffic throughput at intersections.

Cooperative Infrastructure Perception

TL;DR

The paper addresses line-of-sight limitations in autonomous perception by introducing Cooperative Infrastructure Perception (CIP), which fuses outputs from multiple roadside LiDARs to produce 3D bounding boxes and tracks in real time. The design centers on a bounding-box abstraction and an accurate initial alignment method, augmented by fast, GPU-accelerated heading estimation and several system optimizations to meet a sub-100 ms latency on commodity edge hardware. Results from real-world traces and CarLA simulations show CIP achieves competitive accuracy with low latency and demonstrates substantial safety benefits when augmenting vehicle perception, as well as notable throughput gains when planning is offloaded to the edge. The work highlights a practical path toward real-time, infrastructure-supported perception that enables safe, traffic-efficient operation in complex environments and mixed reality applications.

Abstract

Recent works have considered two qualitatively different approaches to overcome line-of-sight limitations of 3D sensors used for perception: cooperative perception and infrastructure-augmented perception. In this paper, motivated by increasing deployments of infrastructure LiDARs, we explore a third approach, cooperative infrastructure perception. This approach generates perception outputs by fusing outputs of multiple infrastructure sensors, but, to be useful, must do so quickly and accurately. We describe the design, implementation and evaluation of Cooperative Infrastructure Perception (CIP), which uses a combination of novel algorithms and systems optimizations. It produces perception outputs within 100 ms using modest computing resources and with accuracy comparable to the state-of-the-art. CIP, when used to augment vehicle perception, can improve safety. When used in conjunction with offloaded planning, CIP can increase traffic throughput at intersections.
Paper Structure (19 sections, 15 figures, 8 tables, 1 algorithm)

This paper contains 19 sections, 15 figures, 8 tables, 1 algorithm.

Figures (15)

  • Figure 1: CIP deployment at an intersection with multiple LiDARs and nearby edge compute.
  • Figure 2: CIP deployment at a busy intersection with heavy vehicular and pedestrian traffic in a large metropolitan city. (a) A top down view of the intersection (taken from Google Maps google_maps). We mounted four LiDARs near each of traffic light poles situated at the four corners of the intersection. (b) An individual frame from each one of the four LiDARs. (c) A fused frame. (d) Point clouds of traffic participants (dynamic objects) at the intersection. (e) Bounding boxes and motion vectors for traffic participants, calculated over successive frames.
  • Figure 3: Perception stages. Bold sub-stages described in detail.
  • Figure 4: An illustration of CIP's point cloud alignment algorithm. (a) A top down view of a parking lot with two LiDARs shown by red ($L_{1}$) and blue ($L_{2}$) icons. (b) The inputs to initial guess algorithm are point clouds ($C_{1}$ and $C_{2}$) in the respective LiDAR's coordinate system along with the ground distance $d$ between them. (c) Figure (b) with background removed. (d) To fix the base coordinates, CIP displaces $C_{2}$ by the ground distance $d$. (e) CIP rotates both $C_{1}$ and $C_{2}$ by small yaw increments to find the combination with the least distance between the point clouds.
  • Figure 5: The figure shows the points belonging to a vehicle in two successive frames $t$ and $t+1$. (a) Strawman approach for heading determination. (b) CIP's approach.
  • ...and 10 more figures