Table of Contents
Fetching ...

ALDI-ray: Adapting the ALDI Framework for Security X-ray Object Detection

Omid Reza Heidari, Yang Wang, Xinxin Zuo

TL;DR

The paper tackles domain shift in security X-ray object detection by adapting ALDI++ to the X-ray setting. It introduces burn-in pretraining, soft distillation, and balanced training to enhance cross-domain robustness, evaluating on the multi-domain EDS dataset with various backbones, notably ViTDet, to achieve state-of-the-art mAP. Results show consistent cross-domain gains and strong category-level improvements, establishing ALDI++ as a robust solution for domain-adaptive X-ray detection. The work underlines transformer-based architectures' effectiveness for cross-domain X-ray object detection and provides insights into balancing supervision across domains. Overall, ALDI++ sets a new benchmark for performance stability and generalization in security X-ray imagery.

Abstract

Domain adaptation in object detection is critical for real-world applications where distribution shifts degrade model performance. Security X-ray imaging presents a unique challenge due to variations in scanning devices and environmental conditions, leading to significant domain discrepancies. To address this, we apply ALDI++, a domain adaptation framework that integrates self-distillation, feature alignment, and enhanced training strategies to mitigate domain shift effectively in this area. We conduct extensive experiments on the EDS dataset, demonstrating that ALDI++ surpasses the state-of-the-art (SOTA) domain adaptation methods across multiple adaptation scenarios. In particular, ALDI++ with a Vision Transformer for Detection (ViTDet) backbone achieves the highest mean average precision (mAP), confirming the effectiveness of transformer-based architectures for cross-domain object detection. Additionally, our category-wise analysis highlights consistent improvements in detection accuracy, reinforcing the robustness of the model across diverse object classes. Our findings establish ALDI++ as an efficient solution for domain-adaptive object detection, setting a new benchmark for performance stability and cross-domain generalization in security X-ray imagery.

ALDI-ray: Adapting the ALDI Framework for Security X-ray Object Detection

TL;DR

The paper tackles domain shift in security X-ray object detection by adapting ALDI++ to the X-ray setting. It introduces burn-in pretraining, soft distillation, and balanced training to enhance cross-domain robustness, evaluating on the multi-domain EDS dataset with various backbones, notably ViTDet, to achieve state-of-the-art mAP. Results show consistent cross-domain gains and strong category-level improvements, establishing ALDI++ as a robust solution for domain-adaptive X-ray detection. The work underlines transformer-based architectures' effectiveness for cross-domain X-ray object detection and provides insights into balancing supervision across domains. Overall, ALDI++ sets a new benchmark for performance stability and generalization in security X-ray imagery.

Abstract

Domain adaptation in object detection is critical for real-world applications where distribution shifts degrade model performance. Security X-ray imaging presents a unique challenge due to variations in scanning devices and environmental conditions, leading to significant domain discrepancies. To address this, we apply ALDI++, a domain adaptation framework that integrates self-distillation, feature alignment, and enhanced training strategies to mitigate domain shift effectively in this area. We conduct extensive experiments on the EDS dataset, demonstrating that ALDI++ surpasses the state-of-the-art (SOTA) domain adaptation methods across multiple adaptation scenarios. In particular, ALDI++ with a Vision Transformer for Detection (ViTDet) backbone achieves the highest mean average precision (mAP), confirming the effectiveness of transformer-based architectures for cross-domain object detection. Additionally, our category-wise analysis highlights consistent improvements in detection accuracy, reinforcing the robustness of the model across diverse object classes. Our findings establish ALDI++ as an efficient solution for domain-adaptive object detection, setting a new benchmark for performance stability and cross-domain generalization in security X-ray imagery.

Paper Structure

This paper contains 5 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: An overview of the ALDI++ training pipeline.
  • Figure 2: Distribution of the number of instances per object category in the EDS1, EDS2, and EDS3 sub-datasets. Each bar represents the frequency of a specific category within each domain, highlighting dataset balance and inter-domain variations.
  • Figure 3: Qualitative comparison of detection results on domain adaptation tasks. The top two rows (a–j) show successful detection cases: (a–e) correspond to the D2$\rightarrow$D3 task and (f–j) correspond to the D3$\rightarrow$D1 task. The bottom row (k–o) illustrates a failure case from the D2$\rightarrow$D3 task. For each group: (a, f, k) Source-only model; (b, g, l) ALDI++ using VGG-16; (c, h, m) ALDI++ using FPN; (d, i, n) ALDI++ using ViTDet; (e, j, o) Ground Truth.