MobiFuse: A High-Precision On-device Depth Perception System with Multi-Data Fusion
Jinrui Zhang, Deyu Zhang, Tingting Long, Wenxin Chen, Ju Ren, Yunxin Liu, Yudong Zhao, Yaoxue Zhang, Youngki Lee
TL;DR
MobiFuse tackles the challenges of mobile depth perception by fusing ToF and dual RGB stereo through a physics-informed Depth Error Indication (DEI) modality and a two-stage TSFuseNet with progressive fusion and backward connections. It introduces RealToF, a real-world ToF-Stereo depth dataset with precise pixel-level ground truth, enabling robust training and evaluation. The approach achieves substantial improvements over baselines in MAE/RMSE and downstream tasks (3D reconstruction and RGB-D segmentation) while remaining efficient on commodity mobile hardware. The work advances practical on-device depth sensing with strong generalization and real-world applicability for AR, 3D perception, and related tasks.
Abstract
We present MobiFuse, a high-precision depth perception system on mobile devices that combines dual RGB and Time-of-Flight (ToF) cameras. To achieve this, we leverage physical principles from various environmental factors to propose the Depth Error Indication (DEI) modality, characterizing the depth error of ToF and stereo-matching. Furthermore, we employ a progressive fusion strategy, merging geometric features from ToF and stereo depth maps with depth error features from the DEI modality to create precise depth maps. Additionally, we create a new ToF-Stereo depth dataset, RealToF, to train and validate our model. Our experiments demonstrate that MobiFuse excels over baselines by significantly reducing depth measurement errors by up to 77.7%. It also showcases strong generalization across diverse datasets and proves effectiveness in two downstream tasks: 3D reconstruction and 3D segmentation. The demo video of MobiFuse in real-life scenarios is available at the de-identified YouTube link(https://youtu.be/jy-Sp7T1LVs).
