Depth as Points: Center Point-based Depth Estimation
Zhiheng Tu, Xinjian Huang, Yong He, Ruiyang Zhou, Bo Du, Weitao Wu
TL;DR
This work tackles the challenge of real-time monocular depth perception in urban autonomous driving by introducing CenterDepth, a center-point regression framework that couples object detection with localized depth prediction. Central to the approach are Center Point Regression for detecting object centers and Center FC-CRFs for efficient, global-information–driven depth propagation anchored at those centers, enabling accurate depth over distances up to $200$ meters without full-scene depth maps. To support training and evaluation, the authors build virDepth, a large virtual dataset generated via CARLA and UE4, providing synchronized RGB, depth, and semantic labels across diverse urban scenes. Empirical results show CenterDepth achieves high depth accuracy (e.g., $\delta_1$ approaching $0.989$ on virDepth) and favorable efficiency across backbones, outperforming state-of-the-art methods on virDepth, Virtual KITTI 2, KITTI-Depth, and KITTI-3D, while maintaining strong generalization and BEV-path-planning utility. These findings suggest a practical, scalable path to robust monocular depth perception for real-time autonomous driving applications.
Abstract
The perception of vehicles and pedestrians in urban scenarios is crucial for autonomous driving. This process typically involves complicated data collection, imposes high computational and hardware demands. To address these limitations, we first develop a highly efficient method for generating virtual datasets, which enables the creation of task- and scenario-specific datasets in a short time. Leveraging this method, we construct the virtual depth estimation dataset VirDepth, a large-scale, multi-task autonomous driving dataset. Subsequently, we propose CenterDepth, a lightweight architecture for monocular depth estimation that ensures high operational efficiency and exhibits superior performance in depth estimation tasks with highly imbalanced height-scale distributions. CenterDepth integrates global semantic information through the innovative Center FC-CRFs algorithm, aggregates multi-scale features based on object key points, and enables detection-based depth estimation of targets. Experiments demonstrate that our proposed method achieves superior performance in terms of both computational speed and prediction accuracy.
