Table of Contents
Fetching ...

Generative Model-Based Fusion for Improved Few-Shot Semantic Segmentation of Infrared Images

Junno Yun, Mehmet Akçakaya

TL;DR

This work tackles the challenge of semantic segmentation for infrared images under few-shot data constraints, where paired RGB data is unavailable. It introduces diffusion-based data augmentation to generate lightness images and synthetic RGB auxiliary information, coupled with a fusion ensemble that jointly leverages IR and generated modalities. The approach employs two meta-learners (IR and RGB) with a shared base learner and a fusion mechanism, achieving state-of-the-art performance on IR datasets such as SODA and SCUTSEG without requiring paired RGB data. The results demonstrate substantial gains in both mIoU and foreground–background IoU, indicating strong potential for robust IR segmentation in defense and safety contexts where collecting color-IR pairs is impractical.

Abstract

Infrared (IR) imaging is commonly used in various scenarios, including autonomous driving, fire safety and defense applications. Thus, semantic segmentation of such images is of great interest. However, this task faces several challenges, including data scarcity, differing contrast and input channel number compared to natural images, and emergence of classes not represented in databases in certain scenarios, such as defense applications. Few-shot segmentation (FSS) provides a framework to overcome these issues by segmenting query images using a few labeled support samples. However, existing FSS models for IR images require paired visible RGB images, which is a major limitation since acquiring such paired data is difficult or impossible in some applications. In this work, we develop new strategies for FSS of IR images by using generative modeling and fusion techniques. To this end, we propose to synthesize auxiliary data to provide additional channel information to complement the limited contrast in the IR images, as well as IR data synthesis for data augmentation. Here, the former helps the FSS model to better capture the relationship between the support and query sets, while the latter addresses the issue of data scarcity. Finally, to further improve the former aspect, we propose a novel fusion ensemble module for integrating the two different modalities. Our methods are evaluated on different IR datasets, and improve upon the state-of-the-art (SOTA) FSS models.

Generative Model-Based Fusion for Improved Few-Shot Semantic Segmentation of Infrared Images

TL;DR

This work tackles the challenge of semantic segmentation for infrared images under few-shot data constraints, where paired RGB data is unavailable. It introduces diffusion-based data augmentation to generate lightness images and synthetic RGB auxiliary information, coupled with a fusion ensemble that jointly leverages IR and generated modalities. The approach employs two meta-learners (IR and RGB) with a shared base learner and a fusion mechanism, achieving state-of-the-art performance on IR datasets such as SODA and SCUTSEG without requiring paired RGB data. The results demonstrate substantial gains in both mIoU and foreground–background IoU, indicating strong potential for robust IR segmentation in defense and safety contexts where collecting color-IR pairs is impractical.

Abstract

Infrared (IR) imaging is commonly used in various scenarios, including autonomous driving, fire safety and defense applications. Thus, semantic segmentation of such images is of great interest. However, this task faces several challenges, including data scarcity, differing contrast and input channel number compared to natural images, and emergence of classes not represented in databases in certain scenarios, such as defense applications. Few-shot segmentation (FSS) provides a framework to overcome these issues by segmenting query images using a few labeled support samples. However, existing FSS models for IR images require paired visible RGB images, which is a major limitation since acquiring such paired data is difficult or impossible in some applications. In this work, we develop new strategies for FSS of IR images by using generative modeling and fusion techniques. To this end, we propose to synthesize auxiliary data to provide additional channel information to complement the limited contrast in the IR images, as well as IR data synthesis for data augmentation. Here, the former helps the FSS model to better capture the relationship between the support and query sets, while the latter addresses the issue of data scarcity. Finally, to further improve the former aspect, we propose a novel fusion ensemble module for integrating the two different modalities. Our methods are evaluated on different IR datasets, and improve upon the state-of-the-art (SOTA) FSS models.

Paper Structure

This paper contains 16 sections, 11 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Overall architectures of the proposed methods. (a) We utilize the existing FSS model for IR images. (b), (c) To improve this model, we generate new lightness and RGB datasets with a generative diffusion model. (d) To exploit them, we propose an additional meta-learning component and fusion networks.
  • Figure 2: The proposed methods include two meta learners (Meta Learner IR and Meta Learner RGB), a shared base learner, and a fusion ensemble module. Each meta-learner evaluates the relationship between support and query in the IR and RGB domains, yielding $F_{Meta}^{IR}$ and $F_{Meta}^{RGB}$ respectively. The shared base learner produces predictions $F_{Base}^{IR}$ and $F_{Base}^{RGB}$. The fusion ensemble module integrates these predictions to generate final foreground and background probability maps $F_{fg\_final}$ and $F_{bg\_final}$, leading to the generation of $F_{final}$.
  • Figure 3: The adversarial conditional diffusion model ozbey2023unsupervised we adapt facilitates I2I translation between two different domains. The figure showcases examples of IR-RGB domain translation (Domain $x$: IR, Domain $y$: RGB). Non-diffusive modules (green and orange) employ two generator-discriminator pairs to provide initial estimates of the generated images for unpaired training with cycle-consistent loss zhu2017unpaired. Diffusive models (blue and yellow) utilize these initial estimates (black dashed lines) as conditioning for the denoising process.
  • Figure 4: Description of the proposed generated datasets, including the naming conventions and purposes (augmentation versus auxiliary information).
  • Figure 5: The qualitative evaluation of baseline and the proposed method (method3) on SODA and SCUTSEG under 1-shot setting.
  • ...and 1 more figures