BEVPoolv2: A Cutting-edge Implementation of BEVDet Toward Deployment

Junjie Huang; Guan Huang

BEVPoolv2: A Cutting-edge Implementation of BEVDet Toward Deployment

Junjie Huang, Guan Huang

TL;DR

The paper tackles the bottleneck of view transformation in multi-camera 3D detection for autonomous driving by introducing BEVPoolv2, which eliminates the heavy, memory-intensive frustum feature computation through precomputed indices, enabling extreme speedups and edge deployment. It also shows deployment-oriented enhancements, including TensorRT integration, ego-centered receptive field alignment, and BEVDepth-related depth supervision, along with long-term temporal fusion. The presented BEVDet4D-R50-Depth-CBGS configuration demonstrates competitive accuracy (52.3 NDS) at real-time-like speeds, underscoring practical impact for real-world systems. Overall, the work advances deployment-ready BEVDet variants with scalable performance across backends and platforms.

Abstract

We release a new codebase version of the BEVDet, dubbed branch dev2.0. With dev2.0, we propose BEVPoolv2 upgrade the view transformation process from the perspective of engineering optimization, making it free from a huge burden in both calculation and storage aspects. It achieves this by omitting the calculation and preprocessing of the large frustum feature. As a result, it can be processed within 0.82 ms even with a large input resolution of 640x1600, which is 15.1 times the previous fastest implementation. Besides, it is also less cache consumptive when compared with the previous implementation, naturally as it no longer needs to store the large frustum feature. Last but not least, this also makes the deployment to the other backend handy. We offer an example of deployment to the TensorRT backend in branch dev2.0 and show how fast the BEVDet paradigm can be processed on it. Other than BEVPoolv2, we also select and integrate some substantial progress that was proposed in the past year. As an example configuration, BEVDet4D-R50-Depth-CBGS scores 52.3 NDS on the NuScenes validation set and can be processed at a speed of 16.4 FPS with the PyTorch backend. The code has been released to facilitate the study on https://github.com/HuangJunJie2017/BEVDet/tree/dev2.0.

BEVPoolv2: A Cutting-edge Implementation of BEVDet Toward Deployment

TL;DR

Abstract

Paper Structure (10 sections, 3 figures, 1 table)

This paper contains 10 sections, 3 figures, 1 table.

Introduction
Modification
BEVPoolv2
TensorRT
Receptive Field
Other Modifications
BEVDepth
Temporal Fusion
Stereo Depth Estimation
BEVDet4D-R50-Depth-CBGS

Figures (3)

Figure 1: Illustration of BEVPool from BEVFusion BEVFusion and BEVPoolv2 from BEVDet-dev2.0. The part in low transparency can be pre-computed offline.
Figure 2: Inference speed of different implementation of Lift-Splat-Shoot view transformation. When the number of depth classes is set as $D=59$, the implementation of branch dev2.0, dubbed BEVPoolv2 is 3.1 times the previous fastest implementation in a low input resolution of $256\times704$ and 8.2 times in a high resolution of $640\times1760$.
Figure 3: Memory requirement of different implementation of Lift-Splat-Shoot view transformation. When the number of depth classes is set as $D=59$, the implementation of branch dev2.0, dubbed BEVPoolv2 need just 5.7% memory requirement of the previous fastest implementation in a low input resolution of $256\times704$ and 2.0% in a high resolution of $640\times1760$.

BEVPoolv2: A Cutting-edge Implementation of BEVDet Toward Deployment

TL;DR

Abstract

BEVPoolv2: A Cutting-edge Implementation of BEVDet Toward Deployment

Authors

TL;DR

Abstract

Table of Contents

Figures (3)