Table of Contents
Fetching ...

Grid-Centric Traffic Scenario Perception for Autonomous Driving: A Comprehensive Review

Yining Shi, Kun Jiang, Jiusi Li, Zelin Qian, Junze Wen, Mengmeng Yang, Ke Wang, Diange Yang

TL;DR

The paper surveys grid-centric perception for autonomous driving, arguing that occupancy grids offer robust, geometry-first representations that handle open-world scenarios and occlusions better than object-centric pipelines. It comprehensively covers data pipelines, 3D/4D occupancy networks, temporal fusion, label-efficient learning, and planning integration, highlighting advances from 2D BEV to 4D forecasting and world models. Key contributions include a hierarchical taxonomy, a synthesis of datasets and benchmarks, and a discussion of deployment considerations, efficiency, and future directions such as open-vocabulary occupancy and scalable pre-training. The work demonstrates that grid-centric perception can enhance robustness, interpretability, and safety in driving systems, while also outlining challenges like labeling cost, long-range forecasting, and adapters for planning. Overall, this survey guides researchers and practitioners toward scalable, multi-modal occupancy frameworks that integrate perception, prediction, and planning in autonomous vehicles.

Abstract

Grid-centric perception is a crucial field for mobile robot perception and navigation. Nonetheless, grid-centric perception is less prevalent than object-centric perception as autonomous vehicles need to accurately perceive highly dynamic, large-scale traffic scenarios and the complexity and computational costs of grid-centric perception are high. In recent years, the rapid development of deep learning techniques and hardware provides fresh insights into the evolution of grid-centric perception. The fundamental difference between grid-centric and object-centric pipeline lies in that grid-centric perception follows a geometry-first paradigm which is more robust to the open-world driving scenarios with endless long-tailed semantically-unknown obstacles. Recent researches demonstrate the great advantages of grid-centric perception, such as comprehensive fine-grained environmental representation, greater robustness to occlusion and irregular shaped objects, better ground estimation, and safer planning policies. There is also a growing trend that the capacity of occupancy networks are greatly expanded to 4D scene perception and prediction and latest techniques are highly related to new research topics such as 4D occupancy forecasting, generative AI and world models in the field of autonomous driving. Given the lack of current surveys for this rapidly expanding field, we present a hierarchically-structured review of grid-centric perception for autonomous vehicles. We organize previous and current knowledge of occupancy grid techniques along the main vein from 2D BEV grids to 3D occupancy to 4D occupancy forecasting. We additionally summarize label-efficient occupancy learning and the role of grid-centric perception in driving systems. Lastly, we present a summary of the current research trend and provide future outlooks.

Grid-Centric Traffic Scenario Perception for Autonomous Driving: A Comprehensive Review

TL;DR

The paper surveys grid-centric perception for autonomous driving, arguing that occupancy grids offer robust, geometry-first representations that handle open-world scenarios and occlusions better than object-centric pipelines. It comprehensively covers data pipelines, 3D/4D occupancy networks, temporal fusion, label-efficient learning, and planning integration, highlighting advances from 2D BEV to 4D forecasting and world models. Key contributions include a hierarchical taxonomy, a synthesis of datasets and benchmarks, and a discussion of deployment considerations, efficiency, and future directions such as open-vocabulary occupancy and scalable pre-training. The work demonstrates that grid-centric perception can enhance robustness, interpretability, and safety in driving systems, while also outlining challenges like labeling cost, long-range forecasting, and adapters for planning. Overall, this survey guides researchers and practitioners toward scalable, multi-modal occupancy frameworks that integrate perception, prediction, and planning in autonomous vehicles.

Abstract

Grid-centric perception is a crucial field for mobile robot perception and navigation. Nonetheless, grid-centric perception is less prevalent than object-centric perception as autonomous vehicles need to accurately perceive highly dynamic, large-scale traffic scenarios and the complexity and computational costs of grid-centric perception are high. In recent years, the rapid development of deep learning techniques and hardware provides fresh insights into the evolution of grid-centric perception. The fundamental difference between grid-centric and object-centric pipeline lies in that grid-centric perception follows a geometry-first paradigm which is more robust to the open-world driving scenarios with endless long-tailed semantically-unknown obstacles. Recent researches demonstrate the great advantages of grid-centric perception, such as comprehensive fine-grained environmental representation, greater robustness to occlusion and irregular shaped objects, better ground estimation, and safer planning policies. There is also a growing trend that the capacity of occupancy networks are greatly expanded to 4D scene perception and prediction and latest techniques are highly related to new research topics such as 4D occupancy forecasting, generative AI and world models in the field of autonomous driving. Given the lack of current surveys for this rapidly expanding field, we present a hierarchically-structured review of grid-centric perception for autonomous vehicles. We organize previous and current knowledge of occupancy grid techniques along the main vein from 2D BEV grids to 3D occupancy to 4D occupancy forecasting. We additionally summarize label-efficient occupancy learning and the role of grid-centric perception in driving systems. Lastly, we present a summary of the current research trend and provide future outlooks.
Paper Structure (60 sections, 11 equations, 13 figures, 4 tables)

This paper contains 60 sections, 11 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Hierarchically-structured taxonomy of grid-centric perception for autonomous driving.
  • Figure 2: An illustration of grid-centric perception in autonomous driving scenarios. (Left): The autonomous vehicle is equipped with different sensors such as LiDARs, radars and cameras, as well as GPS and IMU for localization. (Middle): The raw data, point clouds and images are processed in spatio-temporal networks with the supervision from annotations of bounding boxes or occupancy. (Right): Different perception outputs for grid-centric perception. Images source from MonoSceneMT-DOGMcam4doccMotionNet.
  • Figure 3: Local-DIFLocal-DIFs: Local-DIF generates continuous representation for 3D semantic occupancy, which exhibits quantization artefacts on slanted surfaces (e.g. road plane) or edges between objects resulting from a discretization into voxels.
  • Figure 4: Tesla FSDtesla_ai_day_2022: Tesla FSD Beta perception system built on top of Occupancy Network.
  • Figure 5: An illustration of an inverse sensor model of a single LiDAR reflection showing the side view (left) and birds-eye view(right)inverse_sensor_model
  • ...and 8 more figures