Table of Contents
Fetching ...

HalluciDet: Hallucinating RGB Modality for Person Detection Through Privileged Information

Heitor Rapela Medeiros, Fidel A. Guerrero Pena, Masih Aminbeidokhti, Thomas Dubail, Eric Granger, Marco Pedersoli

TL;DR

HalluciDet tackles the domain gap between infrared and visible imagery for person detection by leveraging privileged information from a pre-trained RGB detector to guide an IR-to-RGB-like representation. The approach trains a hallucination network end-to-end with a task-focused loss while freezing the RGB detector, enabling IR inputs to be processed effectively by RGB detectors without RGB data at inference. Empirical results on LLVIP and FLIR ADAS show HalluciDet outperforms standard image-translation baselines and even fine-tuning on IR data, especially for Faster R-CNN, and maintains performance in RGB mode. The framework highlights the value of task-driven translation and privileged information for cross-modal detection, with practical benefits in surveillance and automotive contexts.

Abstract

A powerful way to adapt a visual recognition model to a new domain is through image translation. However, common image translation approaches only focus on generating data from the same distribution as the target domain. Given a cross-modal application, such as pedestrian detection from aerial images, with a considerable shift in data distribution between infrared (IR) to visible (RGB) images, a translation focused on generation might lead to poor performance as the loss focuses on irrelevant details for the task. In this paper, we propose HalluciDet, an IR-RGB image translation model for object detection. Instead of focusing on reconstructing the original image on the IR modality, it seeks to reduce the detection loss of an RGB detector, and therefore avoids the need to access RGB data. This model produces a new image representation that enhances objects of interest in the scene and greatly improves detection performance. We empirically compare our approach against state-of-the-art methods for image translation and for fine-tuning on IR, and show that our HalluciDet improves detection accuracy in most cases by exploiting the privileged information encoded in a pre-trained RGB detector. Code: https://github.com/heitorrapela/HalluciDet

HalluciDet: Hallucinating RGB Modality for Person Detection Through Privileged Information

TL;DR

HalluciDet tackles the domain gap between infrared and visible imagery for person detection by leveraging privileged information from a pre-trained RGB detector to guide an IR-to-RGB-like representation. The approach trains a hallucination network end-to-end with a task-focused loss while freezing the RGB detector, enabling IR inputs to be processed effectively by RGB detectors without RGB data at inference. Empirical results on LLVIP and FLIR ADAS show HalluciDet outperforms standard image-translation baselines and even fine-tuning on IR data, especially for Faster R-CNN, and maintains performance in RGB mode. The framework highlights the value of task-driven translation and privileged information for cross-modal detection, with practical benefits in surveillance and automotive contexts.

Abstract

A powerful way to adapt a visual recognition model to a new domain is through image translation. However, common image translation approaches only focus on generating data from the same distribution as the target domain. Given a cross-modal application, such as pedestrian detection from aerial images, with a considerable shift in data distribution between infrared (IR) to visible (RGB) images, a translation focused on generation might lead to poor performance as the loss focuses on irrelevant details for the task. In this paper, we propose HalluciDet, an IR-RGB image translation model for object detection. Instead of focusing on reconstructing the original image on the IR modality, it seeks to reduce the detection loss of an RGB detector, and therefore avoids the need to access RGB data. This model produces a new image representation that enhances objects of interest in the scene and greatly improves detection performance. We empirically compare our approach against state-of-the-art methods for image translation and for fine-tuning on IR, and show that our HalluciDet improves detection accuracy in most cases by exploiting the privileged information encoded in a pre-trained RGB detector. Code: https://github.com/heitorrapela/HalluciDet
Paper Structure (21 sections, 3 equations, 8 figures, 5 tables)

This paper contains 21 sections, 3 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Example of detections using baseline and HalluciDet methods on LLVIP data. (a) Original RGB image with ground truth annotations (yellow). (b) IR image with corresponding detections of a fine-tuned model (green). (c) Translated image from IR to RGB produced by FastCUT and corresponding RGB detections (green). (d) Hallucinated image produced by our method and RGB detections (green); HalluciDet does not seek to reconstruct all image details but only to enhance the objects of interest.
  • Figure 2: HalluciDet leverages privileged information for modality hallucination with pre-trained detectors. During training, the hallucination network learns how to use the privileged information encoded by the RGB detector to translate the IR image into a new hallucination modality representation. Then, during inference, the model provides better IR detection using the translated modality.
  • Figure 3: Illustration of a sequence of $8$ images of LLVIP dataset. The first row is the RGB modality, then the IR modality, followed by FastCUT and different representations created by HalluciDet over various detectors.
  • Figure 4: AP@50 vs. training samples percentages. The figure shows the AP@50 over the LLVIP test set using various amounts of training samples for the HalluciDet Faster R-CNN.
  • Figure 5: AP@50 vs. training samples percentages. The figure shows the AP@50 over the FLIR test set using various amounts of training samples for the HalluciDet Faster R-CNN. Notably, 70% of the data was sufficient for HalluciDet to achieve comparable performance to the fine-tuned Faster R-CNN with the complete dataset.
  • ...and 3 more figures