Table of Contents
Fetching ...

MobiFuse: A High-Precision On-device Depth Perception System with Multi-Data Fusion

Jinrui Zhang, Deyu Zhang, Tingting Long, Wenxin Chen, Ju Ren, Yunxin Liu, Yudong Zhao, Yaoxue Zhang, Youngki Lee

TL;DR

MobiFuse tackles the challenges of mobile depth perception by fusing ToF and dual RGB stereo through a physics-informed Depth Error Indication (DEI) modality and a two-stage TSFuseNet with progressive fusion and backward connections. It introduces RealToF, a real-world ToF-Stereo depth dataset with precise pixel-level ground truth, enabling robust training and evaluation. The approach achieves substantial improvements over baselines in MAE/RMSE and downstream tasks (3D reconstruction and RGB-D segmentation) while remaining efficient on commodity mobile hardware. The work advances practical on-device depth sensing with strong generalization and real-world applicability for AR, 3D perception, and related tasks.

Abstract

We present MobiFuse, a high-precision depth perception system on mobile devices that combines dual RGB and Time-of-Flight (ToF) cameras. To achieve this, we leverage physical principles from various environmental factors to propose the Depth Error Indication (DEI) modality, characterizing the depth error of ToF and stereo-matching. Furthermore, we employ a progressive fusion strategy, merging geometric features from ToF and stereo depth maps with depth error features from the DEI modality to create precise depth maps. Additionally, we create a new ToF-Stereo depth dataset, RealToF, to train and validate our model. Our experiments demonstrate that MobiFuse excels over baselines by significantly reducing depth measurement errors by up to 77.7%. It also showcases strong generalization across diverse datasets and proves effectiveness in two downstream tasks: 3D reconstruction and 3D segmentation. The demo video of MobiFuse in real-life scenarios is available at the de-identified YouTube link(https://youtu.be/jy-Sp7T1LVs).

MobiFuse: A High-Precision On-device Depth Perception System with Multi-Data Fusion

TL;DR

MobiFuse tackles the challenges of mobile depth perception by fusing ToF and dual RGB stereo through a physics-informed Depth Error Indication (DEI) modality and a two-stage TSFuseNet with progressive fusion and backward connections. It introduces RealToF, a real-world ToF-Stereo depth dataset with precise pixel-level ground truth, enabling robust training and evaluation. The approach achieves substantial improvements over baselines in MAE/RMSE and downstream tasks (3D reconstruction and RGB-D segmentation) while remaining efficient on commodity mobile hardware. The work advances practical on-device depth sensing with strong generalization and real-world applicability for AR, 3D perception, and related tasks.

Abstract

We present MobiFuse, a high-precision depth perception system on mobile devices that combines dual RGB and Time-of-Flight (ToF) cameras. To achieve this, we leverage physical principles from various environmental factors to propose the Depth Error Indication (DEI) modality, characterizing the depth error of ToF and stereo-matching. Furthermore, we employ a progressive fusion strategy, merging geometric features from ToF and stereo depth maps with depth error features from the DEI modality to create precise depth maps. Additionally, we create a new ToF-Stereo depth dataset, RealToF, to train and validate our model. Our experiments demonstrate that MobiFuse excels over baselines by significantly reducing depth measurement errors by up to 77.7%. It also showcases strong generalization across diverse datasets and proves effectiveness in two downstream tasks: 3D reconstruction and 3D segmentation. The demo video of MobiFuse in real-life scenarios is available at the de-identified YouTube link(https://youtu.be/jy-Sp7T1LVs).

Paper Structure

This paper contains 33 sections, 5 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: Simplified illustration of ToF measurement.
  • Figure 2: (a) Depth error distribution of mobile ToF and stereo-matching for each pixel. (b) Correlation of existing mobile ToF confidence and depth error. Darker colors indicate more pixel counts.
  • Figure 3: The system architecture of MobiFuse.
  • Figure 4: Influence of different factors on mobile ToF depth error in real-life scenarios.
  • Figure 5: Examples in RealToF dataset.
  • ...and 5 more figures