Spectral-Aware Global Fusion for RGB-Thermal Semantic Segmentation
Ce Zhang, Zifu Wan, Simon Stepputtis, Katia Sycara, Yaqi Xie
TL;DR
This work tackles RGB-T semantic segmentation under challenging conditions by introducing SGFNet, a spectral-aware fusion network. SGFNet explicitly prioritizes higher-frequency, modality-specific details through spectral-aware feature enhancement and channel attention, and it merges RGB and thermal features via a global cross-modal spatial attention mechanism. The approach demonstrates state-of-the-art performance on MFNet and PST900, validated by ablations showing the contribution of each spectral component and attention module. The results indicate robust, cross-condition performance with practical implications for reliable perception in autonomous systems.
Abstract
Semantic segmentation relying solely on RGB data often struggles in challenging conditions such as low illumination and obscured views, limiting its reliability in critical applications like autonomous driving. To address this, integrating additional thermal radiation data with RGB images demonstrates enhanced performance and robustness. However, how to effectively reconcile the modality discrepancies and fuse the RGB and thermal features remains a well-known challenge. In this work, we address this challenge from a novel spectral perspective. We observe that the multi-modal features can be categorized into two spectral components: low-frequency features that provide broad scene context, including color variations and smooth areas, and high-frequency features that capture modality-specific details such as edges and textures. Inspired by this, we propose the Spectral-aware Global Fusion Network (SGFNet) to effectively enhance and fuse the multi-modal features by explicitly modeling the interactions between the high-frequency, modality-specific features. Our experimental results demonstrate that SGFNet outperforms the state-of-the-art methods on the MFNet and PST900 datasets.
