DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium
Antyanta Bangunharcana, Ahmed Magd, Kyung-Soo Kim
TL;DR
Self-supervised monocular depth learning often suffers from pose-induced epipolar errors and dynamic scenes. DualRefine solves this by jointly refining depth $D$ and pose $T$ in a DEQ framework, using iterative, epipolar-guided sampling of local costs and direct feature-alignments to drive both quantities toward a fixed point. Depth updates inform pose refinements, and the evolving pose updates continuously reshape the epipolar geometry, improving matching costs and geometric consistency. On KITTI, DualRefine achieves competitive depth accuracy and markedly better odometry than prior self-supervised baselines, while maintaining memory efficiency through local, fixed-point optimization rather than full 3D cost volumes.
Abstract
Self-supervised multi-frame depth estimation achieves high accuracy by computing matching costs of pixel correspondences between adjacent frames, injecting geometric information into the network. These pixel-correspondence candidates are computed based on the relative pose estimates between the frames. Accurate pose predictions are essential for precise matching cost computation as they influence the epipolar geometry. Furthermore, improved depth estimates can, in turn, be used to align pose estimates. Inspired by traditional structure-from-motion (SfM) principles, we propose the DualRefine model, which tightly couples depth and pose estimation through a feedback loop. Our novel update pipeline uses a deep equilibrium model framework to iteratively refine depth estimates and a hidden state of feature maps by computing local matching costs based on epipolar geometry. Importantly, we used the refined depth estimates and feature maps to compute pose updates at each step. This update in the pose estimates slowly alters the epipolar geometry during the refinement process. Experimental results on the KITTI dataset demonstrate competitive depth prediction and odometry prediction performance surpassing published self-supervised baselines.
