Context-Aware Interaction Network for RGB-T Semantic Segmentation
Ying Lv, Zhi Liu, Gongyang Li
TL;DR
The paper tackles RGB-T semantic segmentation by introducing CAINet, a fusion framework that constructs a context-aware interaction space to explicitly exploit cross-modal complementarity across multiple feature levels. It integrates Context-Aware Complementary Reasoning (CACR), Global Context Modeling (GCM), and Detail Aggregation (DA) with Residual Learning and multi-level auxiliary supervision to guide learning and refine segmentation maps. Empirical results on MFNet and PST900 show state-of-the-art performance with strong cross-modal robustness and efficient 12.16M-parameter design, and generalization to RGB-D data further supports broad applicability. The approach advances multimodal fusion by unifying direct and feedback fusion benefits and leveraging global and boundary information for precise, context-rich segmentation relevant to autonomous driving and related tasks.
Abstract
RGB-T semantic segmentation is a key technique for autonomous driving scenes understanding. For the existing RGB-T semantic segmentation methods, however, the effective exploration of the complementary relationship between different modalities is not implemented in the information interaction between multiple levels. To address such an issue, the Context-Aware Interaction Network (CAINet) is proposed for RGB-T semantic segmentation, which constructs interaction space to exploit auxiliary tasks and global context for explicitly guided learning. Specifically, we propose a Context-Aware Complementary Reasoning (CACR) module aimed at establishing the complementary relationship between multimodal features with the long-term context in both spatial and channel dimensions. Further, considering the importance of global contextual and detailed information, we propose the Global Context Modeling (GCM) module and Detail Aggregation (DA) module, and we introduce specific auxiliary supervision to explicitly guide the context interaction and refine the segmentation map. Extensive experiments on two benchmark datasets of MFNet and PST900 demonstrate that the proposed CAINet achieves state-of-the-art performance. The code is available at https://github.com/YingLv1106/CAINet.
