Table of Contents
Fetching ...

RomniStereo: Recurrent Omnidirectional Stereo Matching

Hualie Jiang, Rui Xu, Minglang Tan, Wenjie Jiang

TL;DR

RomniStereo addresses the challenge of effective $360^{\circ}$ depth sensing with a four-fisheye rig by introducing a RAFT-inspired recurrent update framework that bypasses costly 3D encoders. The method links spherical sweeping outputs to a 2D GRU through opposite adaptive weighting, grid embedding, and adaptive context generation, enabling end-to-end training and strong depth accuracy. Empirical results show a substantial average MAE improvement of $40.7\%$ over prior SOTA across five datasets, along with faster inference as the model capacity scales. This work advances practical omnidirectional depth sensing by combining geometry-aware feature fusion with efficient recurrent matching, suitable for robust robot navigation and related applications.

Abstract

Omnidirectional stereo matching (OSM) is an essential and reliable means for $360^{\circ}$ depth sensing. However, following earlier works on conventional stereo matching, prior state-of-the-art (SOTA) methods rely on a 3D encoder-decoder block to regularize the cost volume, causing the whole system complicated and sub-optimal results. Recently, the Recurrent All-pairs Field Transforms (RAFT) based approach employs the recurrent update in 2D and has efficiently improved image-matching tasks, ie, optical flow, and stereo matching. To bridge the gap between OSM and RAFT, we mainly propose an opposite adaptive weighting scheme to seamlessly transform the outputs of spherical sweeping of OSM into the required inputs for the recurrent update, thus creating a recurrent omnidirectional stereo matching (RomniStereo) algorithm. Furthermore, we introduce two techniques, ie, grid embedding and adaptive context feature generation, which also contribute to RomniStereo's performance. Our best model improves the average MAE metric by 40.7\% over the previous SOTA baseline across five datasets. When visualizing the results, our models demonstrate clear advantages on both synthetic and realistic examples. The code is available at \url{https://github.com/HalleyJiang/RomniStereo}.

RomniStereo: Recurrent Omnidirectional Stereo Matching

TL;DR

RomniStereo addresses the challenge of effective depth sensing with a four-fisheye rig by introducing a RAFT-inspired recurrent update framework that bypasses costly 3D encoders. The method links spherical sweeping outputs to a 2D GRU through opposite adaptive weighting, grid embedding, and adaptive context generation, enabling end-to-end training and strong depth accuracy. Empirical results show a substantial average MAE improvement of over prior SOTA across five datasets, along with faster inference as the model capacity scales. This work advances practical omnidirectional depth sensing by combining geometry-aware feature fusion with efficient recurrent matching, suitable for robust robot navigation and related applications.

Abstract

Omnidirectional stereo matching (OSM) is an essential and reliable means for depth sensing. However, following earlier works on conventional stereo matching, prior state-of-the-art (SOTA) methods rely on a 3D encoder-decoder block to regularize the cost volume, causing the whole system complicated and sub-optimal results. Recently, the Recurrent All-pairs Field Transforms (RAFT) based approach employs the recurrent update in 2D and has efficiently improved image-matching tasks, ie, optical flow, and stereo matching. To bridge the gap between OSM and RAFT, we mainly propose an opposite adaptive weighting scheme to seamlessly transform the outputs of spherical sweeping of OSM into the required inputs for the recurrent update, thus creating a recurrent omnidirectional stereo matching (RomniStereo) algorithm. Furthermore, we introduce two techniques, ie, grid embedding and adaptive context feature generation, which also contribute to RomniStereo's performance. Our best model improves the average MAE metric by 40.7\% over the previous SOTA baseline across five datasets. When visualizing the results, our models demonstrate clear advantages on both synthetic and realistic examples. The code is available at \url{https://github.com/HalleyJiang/RomniStereo}.
Paper Structure (17 sections, 6 equations, 4 figures, 4 tables)

This paper contains 17 sections, 6 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The illustration of the quadruple fisheye camera system and the functionality of our proposed RomniStereo. RomniStereo utilizes the four fisheye images from cameras to predict a panoramic depth map from the virtual reference view; omnidirectional reconstruction can be obtained.
  • Figure 2: Our Proposed Recurrent Omnidirectional Stereo Matching Framework.
  • Figure 3: The weighting masks for $\mathcal{S}_f$ of different methods.
  • Figure 4: Qualitative Comparison. Three examples from OmniThings, OmniHouse, Sunny, and real indoor data provided by OmniMVS are shown from top to bottom. Both the smallest and biggest versions of OmniMVS-ft and RomniStereo-ft are compared. The leftmost column is the input images. For synthetic samples, the results for each model include the estimated depth map and the error map. For the real samples, the results contain the predicted depth map and the resulting panorama. The images are best viewed in color and zooming in.