The Solution for the GAIIC2024 RGB-TIR object detection Challenge
Xiangyu Wu, Jinling Xu, Longfei Huang, Yang Yang
TL;DR
This work tackles RGB-TIR object detection for unmanned aerial vehicles under challenging conditions such as complex backgrounds, lighting variations, and miscalibrated sensor pairs. It introduces a lightweight YOLOv9 framework augmented with dual backbones, multi-level auxiliary supervision, and a transformer-based feature-level fusion module to fuse RGB and TIR features adaptively. Modality-specific data augmentation and diverse ensemble strategies enhance cross-domain robustness, leveraging external datasets like DroneVehicle and Visdrone. The approach achieves competitive results (mAP 0.543 on A and 0.516 on B) at 26 FPS, demonstrating practical viability for real-time, drone-based RGB-TIR detection in varied urban and rural scenes.
Abstract
This report introduces a solution to The task of RGB-TIR object detection from the perspective of unmanned aerial vehicles. Unlike traditional object detection methods, RGB-TIR object detection aims to utilize both RGB and TIR images for complementary information during detection. The challenges of RGB-TIR object detection from the perspective of unmanned aerial vehicles include highly complex image backgrounds, frequent changes in lighting, and uncalibrated RGB-TIR image pairs. To address these challenges at the model level, we utilized a lightweight YOLOv9 model with extended multi-level auxiliary branches that enhance the model's robustness, making it more suitable for practical applications in unmanned aerial vehicle scenarios. For image fusion in RGB-TIR detection, we incorporated a fusion module into the backbone network to fuse images at the feature level, implicitly addressing calibration issues. Our proposed method achieved an mAP score of 0.516 and 0.543 on A and B benchmarks respectively while maintaining the highest inference speed among all models.
