Invisible Gas Detection: An RGB-Thermal Cross Attention Network and A New Benchmark
Jue Wang, Yuxiang Lin, Qi Zhao, Dong Luo, Shuaibao Chen, Wei Chen, Xiaojiang Peng
TL;DR
This study tackles the challenge of detecting invisible industrial gases by leveraging RGB texture information to augment thermal gas cues. The authors introduce RT-CAN, a two-stream RGB-Thermal network that uses an RGB-assisted Cross Attention (RCA) module and a Global Textural Attention (GTA) decoder to achieve accurate pixel-level gas segmentation, paired with a cascaded decoder for multi-scale refinement. To facilitate research, they release Gas-DB, a public dataset of 1293 RGB-Thermal image pairs across eight real-world scenes with detailed annotations. Across extensive experiments, RT-CAN delivers state-of-the-art results among RGB-Thermal methods and outperforms single-stream baselines in accuracy, IoU, and F2 by notable margins, underscoring the value of cross-modal attention and texture-aware decoding for gas leak detection. The Gas-DB resource and the proposed architecture offer a practical path toward robust, real-world detection of vision-invisible gases in industrial settings.
Abstract
The widespread use of various chemical gases in industrial processes necessitates effective measures to prevent their leakage during transportation and storage, given their high toxicity. Thermal infrared-based computer vision detection techniques provide a straightforward approach to identify gas leakage areas. However, the development of high-quality algorithms has been challenging due to the low texture in thermal images and the lack of open-source datasets. In this paper, we present the RGB-Thermal Cross Attention Network (RT-CAN), which employs an RGB-assisted two-stream network architecture to integrate texture information from RGB images and gas area information from thermal images. Additionally, to facilitate the research of invisible gas detection, we introduce Gas-DB, an extensive open-source gas detection database including about 1.3K well-annotated RGB-thermal images with eight variant collection scenes. Experimental results demonstrate that our method successfully leverages the advantages of both modalities, achieving state-of-the-art (SOTA) performance among RGB-thermal methods, surpassing single-stream SOTA models in terms of accuracy, Intersection of Union (IoU), and F2 metrics by 4.86%, 5.65%, and 4.88%, respectively. The code and data can be found at https://github.com/logic112358/RT-CAN.
