BIMII-Net: Brain-Inspired Multi-Iterative Interactive Network for RGB-T Road Scene Semantic Segmentation
Hanshuo Qiu, Jie Jiang, Ruoli Yang, Lixin Zhan, Jizhao Liu
TL;DR
BIMII-Net addresses RGB-T road scene semantic segmentation under challenging illumination by integrating a brain-inspired deep continuous-coupled neural network (DCCNN) with a cross explicit attention-enhanced fusion module (CEAEF) and a complementary interactive multi-layer decoder. The architecture comprises a SegFormer-based encoder with CCNN layers, a dual-branch fusion module, and a three-branch decoder (SFI, DFI, MFE) under a multi-module supervision regime, enabling fine-grained texture capture and global skeleton reasoning. Ablation and comparative experiments on MFNet and PST900 demonstrate strong performance gains, particularly in boundary delineation and small-object segmentation, with robust day/night generalization. The work highlights the viability of brain-inspired computing for multi-modal semantic segmentation and provides a foundation for more efficient, scalable RGB-T models in real-world perception tasks.
Abstract
RGB-T road scene semantic segmentation enhances visual scene understanding in complex environments characterized by inadequate illumination or occlusion by fusing information from RGB and thermal images. Nevertheless, existing RGB-T semantic segmentation models typically depend on simple addition or concatenation strategies or ignore the differences between information at different levels. To address these issues, we proposed a novel RGB-T road scene semantic segmentation network called Brain-Inspired Multi-Iteration Interaction Network (BIMII-Net). First, to meet the requirements of accurate texture and local information extraction in road scenarios like autonomous driving, we proposed a deep continuous-coupled neural network (DCCNN) architecture based on a brain-inspired model. Second, to enhance the interaction and expression capabilities among multi-modal information, we designed a cross explicit attention-enhanced fusion module (CEAEF-Module) in the feature fusion stage of BIMII-Net to effectively integrate features at different levels. Finally, we constructed a complementary interactive multi-layer decoder structure, incorporating the shallow-level feature iteration module (SFI-Module), the deep-level feature iteration module (DFI-Module), and the multi-feature enhancement module (MFE-Module) to collaboratively extract texture details and global skeleton information, with multi-module joint supervision further optimizing the segmentation results. Experimental results demonstrate that BIMII-Net achieves state-of-the-art (SOTA) performance in the brain-inspired computing domain and outperforms most existing RGB-T semantic segmentation methods. It also exhibits strong generalization capabilities on multiple RGB-T datasets, proving the effectiveness of brain-inspired computer models in multi-modal image segmentation tasks.
