Real-time Multi-view Omnidirectional Depth Estimation for Real Scenarios based on Teacher-Student Learning with Unlabeled Data
Ming Li, Xiong Yang, Chaofan Wu, Jiaheng Li, Pinzhi Wang, Xuejiao Hu, Sidan Du, Yang Li
TL;DR
The paper tackles real-time omnidirectional depth estimation for edge devices while ensuring cross-scene generalization. It introduces Rt-OmniMVS, a lightweight framework built around Combined Spherical Sweeping and a 2D CNN-based cost aggregation, coupled with a teacher-student training regime that leverages unlabeled real-world data through pseudo-labels from a state-of-the-art stereo model. To support real-world validation, the authors present HexaMODE, a six-fisheye camera system on an edge computer, and Hexa360Depth, a large hybrid dataset with synthetic and real data. Experiments show Rt-OmniMVS achieves competitive accuracy with significantly improved real-time efficiency (≥15 fps) on edge hardware, along with strong generalization across indoor and outdoor scenarios. This work advances practical real-time 360° depth perception for autonomous driving and robotics by combining algorithmic efficiency with unlabeled-data training and real-world data collection.
Abstract
Omnidirectional depth estimation enables efficient 3D perception over a full 360-degree range. However, in real-world applications such as autonomous driving and robotics, achieving real-time performance and robust cross-scene generalization remains a significant challenge for existing algorithms. In this paper, we propose a real-time omnidirectional depth estimation method for edge computing platforms named Rt-OmniMVS, which introduces the Combined Spherical Sweeping method and implements the lightweight network structure to achieve real-time performance on edge computing platforms. To achieve high accuracy, robustness, and generalization in real-world environments, we introduce a teacher-student learning strategy. We leverage the high-precision stereo matching method as the teacher model to predict pseudo labels for unlabeled real-world data, and utilize data and model augmentation techniques for training to enhance performance of the student model Rt-OmniMVS. We also propose HexaMODE, an omnidirectional depth sensing system based on multi-view fisheye cameras and edge computation device. A large-scale hybrid dataset contains both unlabeled real-world data and synthetic data is collected for model training. Experiments on public datasets demonstrate that proposed method achieves results comparable to state-of-the-art approaches while consuming significantly less resource. The proposed system and algorithm also demonstrate high accuracy in various complex real-world scenarios, both indoors and outdoors, achieving an inference speed of 15 frames per second on edge computing platforms.
