Table of Contents
Fetching ...

Panoptic Perception for Autonomous Driving: A Survey

Yunge Li, Lanyu Xu

TL;DR

Panoptic perception addresses the need to unify multiple autonomous driving perception tasks (detection, segmentation, lane/drivable-area, depth) within a single framework. The paper surveys image-, LiDAR-, and fused-input models organized around backbone–neck–head architectures, detailing backbones (CNNs and transformers), fusion strategies (mid-term BEV-based), and task-specific heads for OD, IS, SS, LD, and DE. It highlights that multi-task networks can achieve competitive or superior performance while reducing latency and resource use through shared representations, with BEV fusion and LiDAR-based approaches showing particular strength in 3D understanding. The survey also discusses challenges such as weight balance, task relevance, and negative transfer, and proposes future directions including adaptive weighting, task correlation metrics, mixture-of-experts, prompt-based techniques, data augmentation, and model compression to enable robust, real-time panoptic perception in diverse driving environments.

Abstract

Panoptic perception represents a forefront advancement in autonomous driving technology, unifying multiple perception tasks into a singular, cohesive framework to facilitate a thorough understanding of the vehicle's surroundings. This survey reviews typical panoptic perception models for their unique inputs and architectures and compares them to performance, responsiveness, and resource utilization. It also delves into the prevailing challenges faced in panoptic perception and explores potential trajectories for future research. Our goal is to furnish researchers in autonomous driving with a detailed synopsis of panoptic perception, positioning this survey as a pivotal reference in the ever-evolving landscape of autonomous driving technologies.

Panoptic Perception for Autonomous Driving: A Survey

TL;DR

Panoptic perception addresses the need to unify multiple autonomous driving perception tasks (detection, segmentation, lane/drivable-area, depth) within a single framework. The paper surveys image-, LiDAR-, and fused-input models organized around backbone–neck–head architectures, detailing backbones (CNNs and transformers), fusion strategies (mid-term BEV-based), and task-specific heads for OD, IS, SS, LD, and DE. It highlights that multi-task networks can achieve competitive or superior performance while reducing latency and resource use through shared representations, with BEV fusion and LiDAR-based approaches showing particular strength in 3D understanding. The survey also discusses challenges such as weight balance, task relevance, and negative transfer, and proposes future directions including adaptive weighting, task correlation metrics, mixture-of-experts, prompt-based techniques, data augmentation, and model compression to enable robust, real-time panoptic perception in diverse driving environments.

Abstract

Panoptic perception represents a forefront advancement in autonomous driving technology, unifying multiple perception tasks into a singular, cohesive framework to facilitate a thorough understanding of the vehicle's surroundings. This survey reviews typical panoptic perception models for their unique inputs and architectures and compares them to performance, responsiveness, and resource utilization. It also delves into the prevailing challenges faced in panoptic perception and explores potential trajectories for future research. Our goal is to furnish researchers in autonomous driving with a detailed synopsis of panoptic perception, positioning this survey as a pivotal reference in the ever-evolving landscape of autonomous driving technologies.
Paper Structure (35 sections, 15 equations, 11 figures, 12 tables)

This paper contains 35 sections, 15 equations, 11 figures, 12 tables.

Figures (11)

  • Figure 1: Overview of multi-task perception model for autonomous driving
  • Figure 2: Cameras
  • Figure 3: LiDARs
  • Figure 4: Sensor fusion
  • Figure 5: Overview of anchor-based object detection models.
  • ...and 6 more figures