Table of Contents
Fetching ...

FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera

Guoyang Zhao, Yuxuan Liu, Weiqing Qi, Fulong Ma, Ming Liu, Jun Ma

TL;DR

FisheyeDepth targets true-scale, self-supervised depth estimation for fisheye cameras by (1) incorporating a Mei-based fisheye projection during training to address distortion, (2) replacing PoseNet with real-scale poses from sensor fusion to remove depth scale ambiguity, and (3) employing a multi-channel output with adaptive feature fusion for robustness. The approach extends Monodepth2 with fisheye-aware projections, real-world pose information, and multi-scale predictions, achieving state-of-the-art results on KITTI-360 among monocular self-supervised methods and demonstrating strong performance in real-world scenarios. The method reduces training and inference complexity while improving depth accuracy and reliability in harsh, distortion-heavy outdoor environments, making it practical for robotics and autonomous navigation. Overall, FisheyeDepth provides a scalable framework for depth estimation with omnidirectional cameras and points toward broader applicability to other omnidirectional sensing systems.

Abstract

Accurate depth estimation is crucial for 3D scene comprehension in robotics and autonomous vehicles. Fisheye cameras, known for their wide field of view, have inherent geometric benefits. However, their use in depth estimation is restricted by a scarcity of ground truth data and image distortions. We present FisheyeDepth, a self-supervised depth estimation model tailored for fisheye cameras. We incorporate a fisheye camera model into the projection and reprojection stages during training to handle image distortions, thereby improving depth estimation accuracy and training stability. Furthermore, we incorporate real-scale pose information into the geometric projection between consecutive frames, replacing the poses estimated by the conventional pose network. Essentially, this method offers the necessary physical depth for robotic tasks, and also streamlines the training and inference procedures. Additionally, we devise a multi-channel output strategy to improve robustness by adaptively fusing features at various scales, which reduces the noise from real pose data. We demonstrate the superior performance and robustness of our model in fisheye image depth estimation through evaluations on public datasets and real-world scenarios. The project website is available at: https://github.com/guoyangzhao/FisheyeDepth.

FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera

TL;DR

FisheyeDepth targets true-scale, self-supervised depth estimation for fisheye cameras by (1) incorporating a Mei-based fisheye projection during training to address distortion, (2) replacing PoseNet with real-scale poses from sensor fusion to remove depth scale ambiguity, and (3) employing a multi-channel output with adaptive feature fusion for robustness. The approach extends Monodepth2 with fisheye-aware projections, real-world pose information, and multi-scale predictions, achieving state-of-the-art results on KITTI-360 among monocular self-supervised methods and demonstrating strong performance in real-world scenarios. The method reduces training and inference complexity while improving depth accuracy and reliability in harsh, distortion-heavy outdoor environments, making it practical for robotics and autonomous navigation. Overall, FisheyeDepth provides a scalable framework for depth estimation with omnidirectional cameras and points toward broader applicability to other omnidirectional sensing systems.

Abstract

Accurate depth estimation is crucial for 3D scene comprehension in robotics and autonomous vehicles. Fisheye cameras, known for their wide field of view, have inherent geometric benefits. However, their use in depth estimation is restricted by a scarcity of ground truth data and image distortions. We present FisheyeDepth, a self-supervised depth estimation model tailored for fisheye cameras. We incorporate a fisheye camera model into the projection and reprojection stages during training to handle image distortions, thereby improving depth estimation accuracy and training stability. Furthermore, we incorporate real-scale pose information into the geometric projection between consecutive frames, replacing the poses estimated by the conventional pose network. Essentially, this method offers the necessary physical depth for robotic tasks, and also streamlines the training and inference procedures. Additionally, we devise a multi-channel output strategy to improve robustness by adaptively fusing features at various scales, which reduces the noise from real pose data. We demonstrate the superior performance and robustness of our model in fisheye image depth estimation through evaluations on public datasets and real-world scenarios. The project website is available at: https://github.com/guoyangzhao/FisheyeDepth.
Paper Structure (20 sections, 11 equations, 4 figures, 2 tables)

This paper contains 20 sections, 11 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Structure of the FisheyeDepth model. (1) We introduce a fisheye camera model during training to reduce projection distortion. (2) Real-scale poses from the robot are incorporated into the training process. (3) A multi-channel output is proposed to ensure stable training through feature fusion.
  • Figure 2: Depth estimation visualization in the KITTI-360 dataset. The image background contains various elements such as roads, vegetation, buildings, and cars in urban road scenes.
  • Figure 3: Setup of the real scene experiment. We use four calibrated fisheye cameras on the UGV platform for data collection.
  • Figure 4: Depth estimation visualization in the real scene. The image background contains complex roads, narrow passages, and different obstacles.