Table of Contents
Fetching ...

WiSE-OD: Benchmarking Robustness in Infrared Object Detection

Heitor R. Medeiros, Atif Belal, Masih Aminbeidokhti, Eric Granger, Marco Pedersoli

TL;DR

This work tackles the robustness gap in infrared object detection under cross-modality distribution shifts by introducing LLVIP-C and FLIR-C, corruption-based cross-modality benchmarks derived from RGB-IR datasets, and validating real-world shifts via M3FD. It proposes WiSE-OD, a simple weight-space ensembling strategy with two variants that blends RGB pre-trained weights with IR fine-tuned or linear-probing weights, preserving inference cost. The study demonstrates that WiSE-OD substantially improves robustness under corruption and real-world OOD scenarios across multiple detectors (Faster R-CNN, FCOS, RetinaNet, YOLOv8) without additional training, achieving notable gains in mPC and resilience of activation maps. The results highlight the practicality of cross-modality ensembling for IR OD and point to future directions such as adaptive lambda selection and broader sensor fusion, enabling more reliable nighttime sensing systems.

Abstract

Object detection (OD) in infrared (IR) imagery is critical for low-light and nighttime applications. However, the scarcity of large-scale IR datasets forces models to rely on weights pre-trained on RGB images. While fine-tuning on IR improves accuracy, it often compromises robustness under distribution shifts due to the inherent modality gap between RGB and IR. To address this, we introduce LLVIP-C and FLIR-C, two cross-modality out-of-distribution (OOD) benchmarks built by applying corruptions to standard IR datasets. Additionally, to fully leverage the complementary knowledge from RGB and infrared-trained models, we propose WiSE-OD, a weight-space ensembling method with two variants: WiSE-OD$_{ZS}$, which combines RGB zero-shot and IR fine-tuned weights, and WiSE-OD$_{LP}$, which blends zero-shot and linear probing. Evaluated using four RGB-pretrained detectors and two robust baselines on our benchmark and in the real-world out-of-distribution M3FD dataset, our WiSE-OD improves robustness across modalities and to corruption in synthetic and real-world distribution shifts without any additional training or inference costs. Our code is available at: https://github.com/heitorrapela/wiseod.

WiSE-OD: Benchmarking Robustness in Infrared Object Detection

TL;DR

This work tackles the robustness gap in infrared object detection under cross-modality distribution shifts by introducing LLVIP-C and FLIR-C, corruption-based cross-modality benchmarks derived from RGB-IR datasets, and validating real-world shifts via M3FD. It proposes WiSE-OD, a simple weight-space ensembling strategy with two variants that blends RGB pre-trained weights with IR fine-tuned or linear-probing weights, preserving inference cost. The study demonstrates that WiSE-OD substantially improves robustness under corruption and real-world OOD scenarios across multiple detectors (Faster R-CNN, FCOS, RetinaNet, YOLOv8) without additional training, achieving notable gains in mPC and resilience of activation maps. The results highlight the practicality of cross-modality ensembling for IR OD and point to future directions such as adaptive lambda selection and broader sensor fusion, enabling more reliable nighttime sensing systems.

Abstract

Object detection (OD) in infrared (IR) imagery is critical for low-light and nighttime applications. However, the scarcity of large-scale IR datasets forces models to rely on weights pre-trained on RGB images. While fine-tuning on IR improves accuracy, it often compromises robustness under distribution shifts due to the inherent modality gap between RGB and IR. To address this, we introduce LLVIP-C and FLIR-C, two cross-modality out-of-distribution (OOD) benchmarks built by applying corruptions to standard IR datasets. Additionally, to fully leverage the complementary knowledge from RGB and infrared-trained models, we propose WiSE-OD, a weight-space ensembling method with two variants: WiSE-OD, which combines RGB zero-shot and IR fine-tuned weights, and WiSE-OD, which blends zero-shot and linear probing. Evaluated using four RGB-pretrained detectors and two robust baselines on our benchmark and in the real-world out-of-distribution M3FD dataset, our WiSE-OD improves robustness across modalities and to corruption in synthetic and real-world distribution shifts without any additional training or inference costs. Our code is available at: https://github.com/heitorrapela/wiseod.

Paper Structure

This paper contains 20 sections, 5 equations, 16 figures, 12 tables.

Figures (16)

  • Figure 1: Robustness of Infrared Object Detection on LLVIP-C and FLIR-C datasets. In the first row, LLVIP-C has a brightness corruption severity level of $5$; in the second row, FLIR-C shot noise corruption has a severity level of $2$. In (a) ground-truth boxes (yellow); (b) zero-shot COCO; (c) fine-tuning (FT); (d) WiSE-OD with Faster R-CNN.
  • Figure 2: LLVIP-C and FLIR-C examples. First row, we have one example from the LLVIP-C test set with two different corruptions: Shot Noise, and Impulse Noise with a severity level of $5$. In the second row, we have one example from the FLIR-C test set with Motion Blur and Zoom Blur with a severity level of $5$.
  • Figure 3: Examples of fog perturbations at different severity levels for LLVIP. Each column shows the effect of increasing corruption severity (1–5) on infrared images. Rows: top-fog, middle-brightness, and bottom-contrast. Higher severities introduce stronger degradations, simulating real-world challenging conditions.
  • Figure 4: Our proposed method: WiSE-OD and its variants. In the large grey box, we have WiSE-OD$_\text{ZS}$ with the equation inside the pink square, and WiSE-OD$_\text{LP}$ in the yellow large box with the equation inside the blue square.
  • Figure 5: AP$_{50}$ versus corruption severity. (a) Frost on LLVIP-C (IR) and (b) Fog on FLIR-C (IR). Curves compare Faster R-CNN in Zero-shot, WiSE-OD$_{ZS}$, and FT settings; y-axis shows AP$_{50}$ (%). Severity increases left-to-right (0–5 for LLVIP-C, 1–5 for FLIR-C). (c) FLIR-C per-class AP$_{50}$ under fog for person, car, and truck. WiSE-OD$_{ZS}$ maintains a higher level of performance across severities.
  • ...and 11 more figures