Table of Contents
Fetching ...

SAGA: Semantic-Aware Gray color Augmentation for Visible-to-Thermal Domain Adaptation across Multi-View Drone and Ground-Based Vision Systems

Manjunath D, Aniruddh Sikdar, Prajwal Gurunath, Sumanth Udupa, Suresh Sundaram

TL;DR

RGB-to-IR domain adaptation for aerial perception is hampered by IR's lack of color and texture and by co-registration challenges between RGB and IR sensors. The authors propose Semantic-Aware Gray color Augmentation (SAGA), an instance-level grayscale augmentation applied to RGB images, integrated within a Mean Teacher framework to improve cross-modal pseudo-labels, and they introduce IndraEye, a multi-sensor drone dataset with synchronized RGB and IR data. Across FLIR, LLVIP, and IndraEye, SAGA yields consistent mAP gains when paired with state-of-the-art UDA methods, demonstrating improved robustness in RGB-to-IR adaptation and reducing false positives in IR-target detection. The IndraEye dataset provides a challenging, diverse benchmark for aerial multimodal perception, enabling evaluation of detection and segmentation across day/night conditions and varying viewpoints, with potential for broader real-world deployment.

Abstract

Domain-adaptive thermal object detection plays a key role in facilitating visible (RGB)-to-thermal (IR) adaptation by reducing the need for co-registered image pairs and minimizing reliance on large annotated IR datasets. However, inherent limitations of IR images, such as the lack of color and texture cues, pose challenges for RGB-trained models, leading to increased false positives and poor-quality pseudo-labels. To address this, we propose Semantic-Aware Gray color Augmentation (SAGA), a novel strategy for mitigating color bias and bridging the domain gap by extracting object-level features relevant to IR images. Additionally, to validate the proposed SAGA for drone imagery, we introduce the IndraEye, a multi-sensor (RGB-IR) dataset designed for diverse applications. The dataset contains 5,612 images with 145,666 instances, captured from diverse angles, altitudes, backgrounds, and times of day, offering valuable opportunities for multimodal learning, domain adaptation for object detection and segmentation, and exploration of sensor-specific strengths and weaknesses. IndraEye aims to enhance the development of more robust and accurate aerial perception systems, especially in challenging environments. Experimental results show that SAGA significantly improves RGB-to-IR adaptation for autonomous driving and IndraEye dataset, achieving consistent performance gains of +0.4% to +7.6% (mAP) when integrated with state-of-the-art domain adaptation techniques. The dataset and codes are available at https://github.com/airliisc/IndraEye.

SAGA: Semantic-Aware Gray color Augmentation for Visible-to-Thermal Domain Adaptation across Multi-View Drone and Ground-Based Vision Systems

TL;DR

RGB-to-IR domain adaptation for aerial perception is hampered by IR's lack of color and texture and by co-registration challenges between RGB and IR sensors. The authors propose Semantic-Aware Gray color Augmentation (SAGA), an instance-level grayscale augmentation applied to RGB images, integrated within a Mean Teacher framework to improve cross-modal pseudo-labels, and they introduce IndraEye, a multi-sensor drone dataset with synchronized RGB and IR data. Across FLIR, LLVIP, and IndraEye, SAGA yields consistent mAP gains when paired with state-of-the-art UDA methods, demonstrating improved robustness in RGB-to-IR adaptation and reducing false positives in IR-target detection. The IndraEye dataset provides a challenging, diverse benchmark for aerial multimodal perception, enabling evaluation of detection and segmentation across day/night conditions and varying viewpoints, with potential for broader real-world deployment.

Abstract

Domain-adaptive thermal object detection plays a key role in facilitating visible (RGB)-to-thermal (IR) adaptation by reducing the need for co-registered image pairs and minimizing reliance on large annotated IR datasets. However, inherent limitations of IR images, such as the lack of color and texture cues, pose challenges for RGB-trained models, leading to increased false positives and poor-quality pseudo-labels. To address this, we propose Semantic-Aware Gray color Augmentation (SAGA), a novel strategy for mitigating color bias and bridging the domain gap by extracting object-level features relevant to IR images. Additionally, to validate the proposed SAGA for drone imagery, we introduce the IndraEye, a multi-sensor (RGB-IR) dataset designed for diverse applications. The dataset contains 5,612 images with 145,666 instances, captured from diverse angles, altitudes, backgrounds, and times of day, offering valuable opportunities for multimodal learning, domain adaptation for object detection and segmentation, and exploration of sensor-specific strengths and weaknesses. IndraEye aims to enhance the development of more robust and accurate aerial perception systems, especially in challenging environments. Experimental results show that SAGA significantly improves RGB-to-IR adaptation for autonomous driving and IndraEye dataset, achieving consistent performance gains of +0.4% to +7.6% (mAP) when integrated with state-of-the-art domain adaptation techniques. The dataset and codes are available at https://github.com/airliisc/IndraEye.

Paper Structure

This paper contains 13 sections, 3 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Snapshots from the IndraEye dataset showing different modalities RGB, IR and complete semantic annotations for detection & segmentation tasks taken from different slant angles
  • Figure 2: Illustration of SAGA augmentation. The process involves extracting objects from the image, converting them to grayscale, and reintegrating them into the original image while preserving background color information.
  • Figure 3: Domain-adaptive thermal object detection with RGB as the source domain and IR as the target domain. (a) Vanilla CMT on the IndraEye dataset. (b) CMT with SAGA on the IndraEye dataset.
  • Figure 4: Output predictions to highlight the importance of the SAGA augmentation on CMT algorithm. (a) and (c) shows the increase in false positives while using vanilla CMT. Meanwhile (b) and (d) shows the reduction in false positives when using SAGA with CMT, showcasing its effectiveness.