Generative Model-Based Fusion for Improved Few-Shot Semantic Segmentation of Infrared Images
Junno Yun, Mehmet Akçakaya
TL;DR
This work tackles the challenge of semantic segmentation for infrared images under few-shot data constraints, where paired RGB data is unavailable. It introduces diffusion-based data augmentation to generate lightness images and synthetic RGB auxiliary information, coupled with a fusion ensemble that jointly leverages IR and generated modalities. The approach employs two meta-learners (IR and RGB) with a shared base learner and a fusion mechanism, achieving state-of-the-art performance on IR datasets such as SODA and SCUTSEG without requiring paired RGB data. The results demonstrate substantial gains in both mIoU and foreground–background IoU, indicating strong potential for robust IR segmentation in defense and safety contexts where collecting color-IR pairs is impractical.
Abstract
Infrared (IR) imaging is commonly used in various scenarios, including autonomous driving, fire safety and defense applications. Thus, semantic segmentation of such images is of great interest. However, this task faces several challenges, including data scarcity, differing contrast and input channel number compared to natural images, and emergence of classes not represented in databases in certain scenarios, such as defense applications. Few-shot segmentation (FSS) provides a framework to overcome these issues by segmenting query images using a few labeled support samples. However, existing FSS models for IR images require paired visible RGB images, which is a major limitation since acquiring such paired data is difficult or impossible in some applications. In this work, we develop new strategies for FSS of IR images by using generative modeling and fusion techniques. To this end, we propose to synthesize auxiliary data to provide additional channel information to complement the limited contrast in the IR images, as well as IR data synthesis for data augmentation. Here, the former helps the FSS model to better capture the relationship between the support and query sets, while the latter addresses the issue of data scarcity. Finally, to further improve the former aspect, we propose a novel fusion ensemble module for integrating the two different modalities. Our methods are evaluated on different IR datasets, and improve upon the state-of-the-art (SOTA) FSS models.
