WiSE-OD: Benchmarking Robustness in Infrared Object Detection
Heitor R. Medeiros, Atif Belal, Masih Aminbeidokhti, Eric Granger, Marco Pedersoli
TL;DR
This work tackles the robustness gap in infrared object detection under cross-modality distribution shifts by introducing LLVIP-C and FLIR-C, corruption-based cross-modality benchmarks derived from RGB-IR datasets, and validating real-world shifts via M3FD. It proposes WiSE-OD, a simple weight-space ensembling strategy with two variants that blends RGB pre-trained weights with IR fine-tuned or linear-probing weights, preserving inference cost. The study demonstrates that WiSE-OD substantially improves robustness under corruption and real-world OOD scenarios across multiple detectors (Faster R-CNN, FCOS, RetinaNet, YOLOv8) without additional training, achieving notable gains in mPC and resilience of activation maps. The results highlight the practicality of cross-modality ensembling for IR OD and point to future directions such as adaptive lambda selection and broader sensor fusion, enabling more reliable nighttime sensing systems.
Abstract
Object detection (OD) in infrared (IR) imagery is critical for low-light and nighttime applications. However, the scarcity of large-scale IR datasets forces models to rely on weights pre-trained on RGB images. While fine-tuning on IR improves accuracy, it often compromises robustness under distribution shifts due to the inherent modality gap between RGB and IR. To address this, we introduce LLVIP-C and FLIR-C, two cross-modality out-of-distribution (OOD) benchmarks built by applying corruptions to standard IR datasets. Additionally, to fully leverage the complementary knowledge from RGB and infrared-trained models, we propose WiSE-OD, a weight-space ensembling method with two variants: WiSE-OD$_{ZS}$, which combines RGB zero-shot and IR fine-tuned weights, and WiSE-OD$_{LP}$, which blends zero-shot and linear probing. Evaluated using four RGB-pretrained detectors and two robust baselines on our benchmark and in the real-world out-of-distribution M3FD dataset, our WiSE-OD improves robustness across modalities and to corruption in synthetic and real-world distribution shifts without any additional training or inference costs. Our code is available at: https://github.com/heitorrapela/wiseod.
