Tri-Perspective View Decomposition for Geometry-Aware Depth Completion
Zhiqiang Yan, Yuankai Lin, Kun Wang, Yupeng Zheng, Yufei Wang, Zhenyu Zhang, Jun Li, Jian Yang
TL;DR
This work tackles depth completion under sparse measurements by introducing Tri-Perspective View Decomposition ($TPVD$), which explicitly models $3$D geometry by projecting a point cloud into three $2$D TPVs (top, side, front) and densifying depth through a recurrent $2$D-$3$D-$2$D TPV Fusion with Distance-Aware Spherical Convolution ($DASC$). A plug-and-play Geometric Spatial Propagation Network ($GSPN$) then refines geometry across TPV and 3D spaces to ensure geometric consistency. The authors also introduce the TOFDC dataset, a smartphone-based depth completion dataset captured with a TOF sensor and color camera, to evaluate real-world, edge-device depth sensing. Across KITTI, NYUv2, SUN RGBD, and TOFDC, TPVD achieves state-of-the-art results and demonstrates strong generalization, including depth-only and cross-dataset performance, highlighting its practical impact for autonomous driving and human-computer interaction scenarios.
Abstract
Depth completion is a vital task for autonomous driving, as it involves reconstructing the precise 3D geometry of a scene from sparse and noisy depth measurements. However, most existing methods either rely only on 2D depth representations or directly incorporate raw 3D point clouds for compensation, which are still insufficient to capture the fine-grained 3D geometry of the scene. To address this challenge, we introduce Tri-Perspective view Decomposition (TPVD), a novel framework that can explicitly model 3D geometry. In particular, (1) TPVD ingeniously decomposes the original point cloud into three 2D views, one of which corresponds to the sparse depth input. (2) We design TPV Fusion to update the 2D TPV features through recurrent 2D-3D-2D aggregation, where a Distance-Aware Spherical Convolution (DASC) is applied. (3) By adaptively choosing TPV affinitive neighbors, the newly proposed Geometric Spatial Propagation Network (GSPN) further improves the geometric consistency. As a result, our TPVD outperforms existing methods on KITTI, NYUv2, and SUN RGBD. Furthermore, we build a novel depth completion dataset named TOFDC, which is acquired by the time-of-flight (TOF) sensor and the color camera on smartphones. Project page: https://yanzq95.github.io/projectpage/TOFDC/index.html
