IGAF: Incremental Guided Attention Fusion for Depth Super-Resolution
Athanasios Tragakis, Chaitanya Kaul, Kevin J. Mitchell, Hang Dai, Roderick Murray-Smith, Daniele Faccio
TL;DR
This work tackles guided depth super-resolution by fusing low-resolution depth with high-resolution RGB guidance. It introduces Incremental Guided Attention Fusion (IGAF) and the Filtered Wide-Focus (FWF) feature extractor to perform cross-modal attention-driven fusion across multiple stages, minimizing RGB-induced artifacts. The approach achieves state-of-the-art results on NYU v2 for multiple upsampling factors and demonstrates strong zero-shot generalization to several datasets, supported by public code. Overall, IGAF provides a robust, generalizable solution for high-quality depth maps applicable to robotics, AR/VR, and medical imaging contexts.
Abstract
Accurate depth estimation is crucial for many fields, including robotics, navigation, and medical imaging. However, conventional depth sensors often produce low-resolution (LR) depth maps, making detailed scene perception challenging. To address this, enhancing LR depth maps to high-resolution (HR) ones has become essential, guided by HR-structured inputs like RGB or grayscale images. We propose a novel sensor fusion methodology for guided depth super-resolution (GDSR), a technique that combines LR depth maps with HR images to estimate detailed HR depth maps. Our key contribution is the Incremental guided attention fusion (IGAF) module, which effectively learns to fuse features from RGB images and LR depth maps, producing accurate HR depth maps. Using IGAF, we build a robust super-resolution model and evaluate it on multiple benchmark datasets. Our model achieves state-of-the-art results compared to all baseline models on the NYU v2 dataset for $\times 4$, $\times 8$, and $\times 16$ upsampling. It also outperforms all baselines in a zero-shot setting on the Middlebury, Lu, and RGB-D-D datasets. Code, environments, and models are available on GitHub.
