QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation

Yuxin Li; Yiheng Li; Xulei Yang; Mengying Yu; Zihang Huang; Xiaojun Wu; Chai Kiat Yeo

QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation

Yuxin Li, Yiheng Li, Xulei Yang, Mengying Yu, Zihang Huang, Xiaojun Wu, Chai Kiat Yeo

TL;DR

This work proposes QuadBEV, an efficient multitask perception framework that leverages the shared spatial and contextual information across four key tasks: 3D object detection, lane detection, map segmentation, and occupancy prediction, and reduces redundant computations, thereby enhancing system efficiency.

Abstract

Bird's-Eye-View (BEV) perception has become a vital component of autonomous driving systems due to its ability to integrate multiple sensor inputs into a unified representation, enhancing performance in various downstream tasks. However, the computational demands of BEV models pose challenges for real-world deployment in vehicles with limited resources. To address these limitations, we propose QuadBEV, an efficient multitask perception framework that leverages the shared spatial and contextual information across four key tasks: 3D object detection, lane detection, map segmentation, and occupancy prediction. QuadBEV not only streamlines the integration of these tasks using a shared backbone and task-specific heads but also addresses common multitask learning challenges such as learning rate sensitivity and conflicting task objectives. Our framework reduces redundant computations, thereby enhancing system efficiency, making it particularly suited for embedded systems. We present comprehensive experiments that validate the effectiveness and robustness of QuadBEV, demonstrating its suitability for real-world applications.

QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation

TL;DR

Abstract

Paper Structure (15 sections, 3 equations, 3 figures, 6 tables)

This paper contains 15 sections, 3 equations, 3 figures, 6 tables.

Introduction
Related Work
Methodology
Model Architecture
Training Method
Loss Design
Experiments
Dataset
Implementation Details
Task Specific Results
Latency Results
Abalation Study
Performance Variations against Pretraining Task
Comparison of Loss Profile against Progressive Training Strategy
Conclusion

Figures (3)

Figure 1: Overall Architecture. Components in this architecture can be divided into two groups, shared feature extractors and task-specific heads. Shared feature extractors include 5 modules, backbone, depth estimator, view projector, temporal fusor and BEV encoder. Task-specific heads include 3d object detection, map segmentation, lane detection and occupancy prediction
Figure 2: Architecture of Quadruple Head on Shared BEV Feature. Four independent heads are attached to the BEV feature map in a round-robin manner.
Figure 3: Comparisons between Different Learning Rate and Weights Schedule

QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation

TL;DR

Abstract

QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)