Tri-Perspective View Decomposition for Geometry-Aware Depth Completion

Zhiqiang Yan; Yuankai Lin; Kun Wang; Yupeng Zheng; Yufei Wang; Zhenyu Zhang; Jun Li; Jian Yang

Tri-Perspective View Decomposition for Geometry-Aware Depth Completion

Zhiqiang Yan, Yuankai Lin, Kun Wang, Yupeng Zheng, Yufei Wang, Zhenyu Zhang, Jun Li, Jian Yang

TL;DR

This work tackles depth completion under sparse measurements by introducing Tri-Perspective View Decomposition ($TPVD$), which explicitly models $3$D geometry by projecting a point cloud into three $2$D TPVs (top, side, front) and densifying depth through a recurrent $2$D-$3$D-$2$D TPV Fusion with Distance-Aware Spherical Convolution ($DASC$). A plug-and-play Geometric Spatial Propagation Network ($GSPN$) then refines geometry across TPV and 3D spaces to ensure geometric consistency. The authors also introduce the TOFDC dataset, a smartphone-based depth completion dataset captured with a TOF sensor and color camera, to evaluate real-world, edge-device depth sensing. Across KITTI, NYUv2, SUN RGBD, and TOFDC, TPVD achieves state-of-the-art results and demonstrates strong generalization, including depth-only and cross-dataset performance, highlighting its practical impact for autonomous driving and human-computer interaction scenarios.

Abstract

Depth completion is a vital task for autonomous driving, as it involves reconstructing the precise 3D geometry of a scene from sparse and noisy depth measurements. However, most existing methods either rely only on 2D depth representations or directly incorporate raw 3D point clouds for compensation, which are still insufficient to capture the fine-grained 3D geometry of the scene. To address this challenge, we introduce Tri-Perspective view Decomposition (TPVD), a novel framework that can explicitly model 3D geometry. In particular, (1) TPVD ingeniously decomposes the original point cloud into three 2D views, one of which corresponds to the sparse depth input. (2) We design TPV Fusion to update the 2D TPV features through recurrent 2D-3D-2D aggregation, where a Distance-Aware Spherical Convolution (DASC) is applied. (3) By adaptively choosing TPV affinitive neighbors, the newly proposed Geometric Spatial Propagation Network (GSPN) further improves the geometric consistency. As a result, our TPVD outperforms existing methods on KITTI, NYUv2, and SUN RGBD. Furthermore, we build a novel depth completion dataset named TOFDC, which is acquired by the time-of-flight (TOF) sensor and the color camera on smartphones. Project page: https://yanzq95.github.io/projectpage/TOFDC/index.html

Tri-Perspective View Decomposition for Geometry-Aware Depth Completion

TL;DR

This work tackles depth completion under sparse measurements by introducing Tri-Perspective View Decomposition (

), which explicitly models

D geometry by projecting a point cloud into three

D TPVs (top, side, front) and densifying depth through a recurrent

D-

D TPV Fusion with Distance-Aware Spherical Convolution (

). A plug-and-play Geometric Spatial Propagation Network (

) then refines geometry across TPV and 3D spaces to ensure geometric consistency. The authors also introduce the TOFDC dataset, a smartphone-based depth completion dataset captured with a TOF sensor and color camera, to evaluate real-world, edge-device depth sensing. Across KITTI, NYUv2, SUN RGBD, and TOFDC, TPVD achieves state-of-the-art results and demonstrates strong generalization, including depth-only and cross-dataset performance, highlighting its practical impact for autonomous driving and human-computer interaction scenarios.

Abstract

Paper Structure (22 sections, 11 equations, 15 figures, 9 tables)

This paper contains 22 sections, 11 equations, 15 figures, 9 tables.

Introduction
Related Work
TPVD
Overview
TPV Projection
TPV Interaction
Geometry-Aware Refinement
TOFDC
Experiments
Datasets
Comparison with State-of-the-arts
Generalization Capability
Ablation Studies
Conclusion
Distance-Aware Spherical Convolution
...and 7 more sections

Figures (15)

Figure 1: Framework comparison. (a) Previous 2D methods focus on 2D space to recover dense depth, and (b) recent 2D-3D joint approaches introduce 3D point clouds for assistance. Differently, (c) our TPVD decomposes the 3D point clouds into three 2D views to densify the sparse input while preserving the 3D geometry.
Figure 2: Pipeline of TPVD. The 3D point cloud is first projected into top, side, and front views, where the raw 2D sparse depth input is included in the front view. Then the three views are fed into 2D UNets to produce TPV features that are aggregated by the 2D-3D-2D TPV Fusion, obtaining denser depth with richer geometry. Finally, on the output side, the plug-and-play geometric spatial propagation network (GSPN) generates refined depth results with consistent geometry. DASC refers to the distance-aware spherical convolution.
Figure 3: Percentage of non-empty units across different distances between cubic and our spherical transformations.
Figure 4: Comparison of SPNs 2018Learningpark2020nonlocal with different neighbor sets. 'aggr.' refers to aggregation while 'prop.' indicates propagation.
Figure 5: Acquisition system (left) and data comparison (right).
...and 10 more figures

Tri-Perspective View Decomposition for Geometry-Aware Depth Completion

TL;DR

Abstract

Tri-Perspective View Decomposition for Geometry-Aware Depth Completion

Authors

TL;DR

Abstract

Table of Contents

Figures (15)