ALDI-ray: Adapting the ALDI Framework for Security X-ray Object Detection
Omid Reza Heidari, Yang Wang, Xinxin Zuo
TL;DR
The paper tackles domain shift in security X-ray object detection by adapting ALDI++ to the X-ray setting. It introduces burn-in pretraining, soft distillation, and balanced training to enhance cross-domain robustness, evaluating on the multi-domain EDS dataset with various backbones, notably ViTDet, to achieve state-of-the-art mAP. Results show consistent cross-domain gains and strong category-level improvements, establishing ALDI++ as a robust solution for domain-adaptive X-ray detection. The work underlines transformer-based architectures' effectiveness for cross-domain X-ray object detection and provides insights into balancing supervision across domains. Overall, ALDI++ sets a new benchmark for performance stability and generalization in security X-ray imagery.
Abstract
Domain adaptation in object detection is critical for real-world applications where distribution shifts degrade model performance. Security X-ray imaging presents a unique challenge due to variations in scanning devices and environmental conditions, leading to significant domain discrepancies. To address this, we apply ALDI++, a domain adaptation framework that integrates self-distillation, feature alignment, and enhanced training strategies to mitigate domain shift effectively in this area. We conduct extensive experiments on the EDS dataset, demonstrating that ALDI++ surpasses the state-of-the-art (SOTA) domain adaptation methods across multiple adaptation scenarios. In particular, ALDI++ with a Vision Transformer for Detection (ViTDet) backbone achieves the highest mean average precision (mAP), confirming the effectiveness of transformer-based architectures for cross-domain object detection. Additionally, our category-wise analysis highlights consistent improvements in detection accuracy, reinforcing the robustness of the model across diverse object classes. Our findings establish ALDI++ as an efficient solution for domain-adaptive object detection, setting a new benchmark for performance stability and cross-domain generalization in security X-ray imagery.
