Multi-Order Matching Network for Alignment-Free Depth Super-Resolution
Zhengxue Wang, Zhiqiang Yan, Yuan Wu, Guangwei Gao, Xiang Li, Jian Yang
TL;DR
Real-world RGB-D capture often suffers from misalignment between RGB guidance and depth, limiting traditional RGB-guided depth super-resolution (DSR). The paper introduces MOMNet, an alignment-free framework that jointly performs zero-order, first-order, and second-order matching ($ ext{ZOM}$, $ ext{FOM}$, $ ext{SOM}$) to retrieve depth-relevant RGB information across a multi-order feature space, followed by a multi-order aggregation using structure detectors and a multi-order regularization to constrain learning in a multi-order space with $oxed{\, ext{Total loss}=oxed{\mathcal{L}_{rec}+ ext{L}_{grad}+ alpha ext{L}_{hes}}}$. Evaluated on Hypersim, DIML, and DyDToF, MOMNet delivers state-of-the-art robustness and accuracy across scales ($ imes4$, $ imes8$, $ imes16$) and includes a lightweight variant MOMNet-T that greatly reduces parameters while maintaining competitive performance. The approach advances alignment-free cross-modal fusion for depth enhancement, showing improved resilience to misalignment and noise and enabling practical deployment on consumer-grade sensors.
Abstract
Recent guided depth super-resolution methods are premised on the assumption of strictly spatial alignment between depth and RGB, achieving high-quality depth reconstruction. However, in real-world scenarios, the acquisition of strictly aligned RGB-D is hindered by inherent hardware limitations (e.g., physically separate RGB-D sensors) and unavoidable calibration drift induced by mechanical vibrations or temperature variations. Consequently, existing approaches often suffer inevitable performance degradation when applied to misaligned real-world scenes. In this paper, we propose the Multi-Order Matching Network (MOMNet), a novel alignment-free framework that adaptively retrieves and selects the most relevant information from misaligned RGB. Specifically, our method begins with a multi-order matching mechanism, which jointly performs zero-order, first-order, and second-order matching to comprehensively identify RGB information consistent with depth across multi-order feature spaces. To effectively integrate the retrieved RGB and depth, we further introduce a multi-order aggregation composed of multiple structure detectors. This strategy uses multi-order priors as prompts to facilitate the selective feature transfer from RGB to depth. Extensive experiments demonstrate that MOMNet achieves state-of-the-art performance and exhibits outstanding robustness.
