Table of Contents
Fetching ...

LFP: Efficient and Accurate End-to-End Lane-Level Planning via Camera-LiDAR Fusion

Guoliang You, Xiaomeng Chu, Yifan Duan, Xingchen Li, Sha Zhang, Jianmin Ji, Yanyong Zhang

TL;DR

This work addresses depth and efficiency limitations in end-to-end autonomous driving by introducing LFP, a lane-level camera-LiDAR fusion planning framework. It uses image-derived lane priors to guide sparse LiDAR sampling and to drive lane-level cross-modal queries, enabling efficient feature fusion and depth enrichment. The approach comprises four modules that generate lane priors, perform lane-focused LiDAR feature extraction, integrate cross-modal queries, and plan at the lane level, all within a PHP-based double-edge data structure. Experiments on Carla Town05 demonstrate state-of-the-art driving and infraction scores and a significant data-efficiency gain, achieving 19.27 FPS and substantial reductions in LiDAR feature counts, underscoring practical impact for real-time autonomous driving.

Abstract

Multi-modal systems enhance performance in autonomous driving but face inefficiencies due to indiscriminate processing within each modality. Additionally, the independent feature learning of each modality lacks interaction, which results in extracted features that do not possess the complementary characteristics. These issue increases the cost of fusing redundant information across modalities. To address these challenges, we propose targeting driving-relevant elements, which reduces the volume of LiDAR features while preserving critical information. This approach enhances lane level interaction between the image and LiDAR branches, allowing for the extraction and fusion of their respective advantageous features. Building upon the camera-only framework PHP, we introduce the Lane-level camera-LiDAR Fusion Planning (LFP) method, which balances efficiency with performance by using lanes as the unit for sensor fusion. Specifically, we design three modules to enhance efficiency and performance. For efficiency, we propose an image-guided coarse lane prior generation module that forecasts the region of interest (ROI) for lanes and assigns a confidence score, guiding LiDAR processing. The LiDAR feature extraction modules leverages lane-aware priors from the image branch to guide sampling for pillar, retaining essential pillars. For performance, the lane-level cross-modal query integration and feature enhancement module uses confidence score from ROI to combine low-confidence image queries with LiDAR queries, extracting complementary depth features. These features enhance the low-confidence image features, compensating for the lack of depth. Experiments on the Carla benchmarks show that our method achieves state-of-the-art performance in both driving score and infraction score, with maximum improvement of 15% and 14% over existing algorithms, respectively, maintaining high frame rate of 19.27 FPS.

LFP: Efficient and Accurate End-to-End Lane-Level Planning via Camera-LiDAR Fusion

TL;DR

This work addresses depth and efficiency limitations in end-to-end autonomous driving by introducing LFP, a lane-level camera-LiDAR fusion planning framework. It uses image-derived lane priors to guide sparse LiDAR sampling and to drive lane-level cross-modal queries, enabling efficient feature fusion and depth enrichment. The approach comprises four modules that generate lane priors, perform lane-focused LiDAR feature extraction, integrate cross-modal queries, and plan at the lane level, all within a PHP-based double-edge data structure. Experiments on Carla Town05 demonstrate state-of-the-art driving and infraction scores and a significant data-efficiency gain, achieving 19.27 FPS and substantial reductions in LiDAR feature counts, underscoring practical impact for real-time autonomous driving.

Abstract

Multi-modal systems enhance performance in autonomous driving but face inefficiencies due to indiscriminate processing within each modality. Additionally, the independent feature learning of each modality lacks interaction, which results in extracted features that do not possess the complementary characteristics. These issue increases the cost of fusing redundant information across modalities. To address these challenges, we propose targeting driving-relevant elements, which reduces the volume of LiDAR features while preserving critical information. This approach enhances lane level interaction between the image and LiDAR branches, allowing for the extraction and fusion of their respective advantageous features. Building upon the camera-only framework PHP, we introduce the Lane-level camera-LiDAR Fusion Planning (LFP) method, which balances efficiency with performance by using lanes as the unit for sensor fusion. Specifically, we design three modules to enhance efficiency and performance. For efficiency, we propose an image-guided coarse lane prior generation module that forecasts the region of interest (ROI) for lanes and assigns a confidence score, guiding LiDAR processing. The LiDAR feature extraction modules leverages lane-aware priors from the image branch to guide sampling for pillar, retaining essential pillars. For performance, the lane-level cross-modal query integration and feature enhancement module uses confidence score from ROI to combine low-confidence image queries with LiDAR queries, extracting complementary depth features. These features enhance the low-confidence image features, compensating for the lack of depth. Experiments on the Carla benchmarks show that our method achieves state-of-the-art performance in both driving score and infraction score, with maximum improvement of 15% and 14% over existing algorithms, respectively, maintaining high frame rate of 19.27 FPS.
Paper Structure (17 sections, 8 equations, 4 figures, 5 tables)

This paper contains 17 sections, 8 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: An illustration comparing (a) the Camera-Only end-to-end planning with (b) our proposed lane-level Camera-LiDAR fusion end-to-end planning, where we utilize the geometric lane priors from images to guide the LiDAR branch in efficiently extracting depth features that the image branch lacks.
  • Figure 2: The LFP integrates image and LiDAR through four modules: (a) The image-guided coarse lane prior generation module, which extracts lane-level image features and generates coarse lane priors (Lane ROI and Lane Weight); (b) The lane-level LiDAR feature extraction module, which performs pillar-based sampling guided by lane priors to focus on lane areas and extracts lane-level LiDAR features; (c) The lane-level cross-modal query integration and feature enhancement module, which integrates queries and features from both image and LiDAR at the query and feature levels; (d) The lane-level planning module, which processes the lane-level enhanced features, outputs lane-level perception and planning results, and converts them into vehicle control signals.
  • Figure 3: Visualizing the Transformation from LiDAR to Lane-Level Pillars
  • Figure 4: Visualization of Lane-Level Perception and Planning in LFP.