Table of Contents
Fetching ...

AO-DETR: Anti-Overlapping DETR for X-Ray Prohibited Items Detection

Mingyuan Li, Tong Jia, Hao Wang, Bowen Ma, Shuyang Lin, Da Cai, Dongyue Chen

TL;DR

AO-DETR addresses two core challenges in X-ray prohibited-item detection—overlap-induced feature coupling and edge blur—by introducing Category-Specific One-to-One Assignment (CSA) and Look Forward Densely (LFD) on top of DINO. CSA enforces category-specific queries to specialize in foreground features for fixed item categories, while LFD enables dense, cross-layer guidance to sharpen boundary localization. The approach yields state-of-the-art results on PIXray and OPIXray across backbones (including Swin-L) with strong robustness and no extra inference cost compared to the baseline, demonstrating practical value for automated security screening. Together, these strategies advance DETR-based methods for challenging, overlapping X-ray imagery and offer concrete improvements for real-world prohibited-item detection systems.

Abstract

Prohibited item detection in X-ray images is one of the most essential and highly effective methods widely employed in various security inspection scenarios. Considering the significant overlapping phenomenon in X-ray prohibited item images, we propose an Anti-Overlapping DETR (AO-DETR) based on one of the state-of-the-art general object detectors, DINO. Specifically, to address the feature coupling issue caused by overlapping phenomena, we introduce the Category-Specific One-to-One Assignment (CSA) strategy to constrain category-specific object queries in predicting prohibited items of fixed categories, which can enhance their ability to extract features specific to prohibited items of a particular category from the overlapping foreground-background features. To address the edge blurring problem caused by overlapping phenomena, we propose the Look Forward Densely (LFD) scheme, which improves the localization accuracy of reference boxes in mid-to-high-level decoder layers and enhances the ability to locate blurry edges of the final layer. Similar to DINO, our AO-DETR provides two different versions with distinct backbones, tailored to meet diverse application requirements. Extensive experiments on the PIXray and OPIXray datasets demonstrate that the proposed method surpasses the state-of-the-art object detectors, indicating its potential applications in the field of prohibited item detection. The source code will be released at https://github.com/Limingyuan001/AO-DETR-test.

AO-DETR: Anti-Overlapping DETR for X-Ray Prohibited Items Detection

TL;DR

AO-DETR addresses two core challenges in X-ray prohibited-item detection—overlap-induced feature coupling and edge blur—by introducing Category-Specific One-to-One Assignment (CSA) and Look Forward Densely (LFD) on top of DINO. CSA enforces category-specific queries to specialize in foreground features for fixed item categories, while LFD enables dense, cross-layer guidance to sharpen boundary localization. The approach yields state-of-the-art results on PIXray and OPIXray across backbones (including Swin-L) with strong robustness and no extra inference cost compared to the baseline, demonstrating practical value for automated security screening. Together, these strategies advance DETR-based methods for challenging, overlapping X-ray imagery and offer concrete improvements for real-world prohibited-item detection systems.

Abstract

Prohibited item detection in X-ray images is one of the most essential and highly effective methods widely employed in various security inspection scenarios. Considering the significant overlapping phenomenon in X-ray prohibited item images, we propose an Anti-Overlapping DETR (AO-DETR) based on one of the state-of-the-art general object detectors, DINO. Specifically, to address the feature coupling issue caused by overlapping phenomena, we introduce the Category-Specific One-to-One Assignment (CSA) strategy to constrain category-specific object queries in predicting prohibited items of fixed categories, which can enhance their ability to extract features specific to prohibited items of a particular category from the overlapping foreground-background features. To address the edge blurring problem caused by overlapping phenomena, we propose the Look Forward Densely (LFD) scheme, which improves the localization accuracy of reference boxes in mid-to-high-level decoder layers and enhances the ability to locate blurry edges of the final layer. Similar to DINO, our AO-DETR provides two different versions with distinct backbones, tailored to meet diverse application requirements. Extensive experiments on the PIXray and OPIXray datasets demonstrate that the proposed method surpasses the state-of-the-art object detectors, indicating its potential applications in the field of prohibited item detection. The source code will be released at https://github.com/Limingyuan001/AO-DETR-test.
Paper Structure (30 sections, 17 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 30 sections, 17 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: The localized X-ray images with prohibited items. The phenomenon of overlap in images, to varying extents, leads to the overlapping of foreground and background as well as the blurring of object boundaries.
  • Figure 2: T-SNE dimensionality reduction comparison. (a) The original X-ray image containing a gun, scissors, and a knife. (b)(c) The distributions visualization of T-SNE dimensionality reduction of the object queries from the last decoder layer in DINO and AO-DETR.
  • Figure 3: The architecture of AO-DETR. The Backbone, Encoder, Decoder, and CDN modules are the same as DINO DINO. For CSA strategy, we match the category-specific high-quality reference boxes obtained from CSM with their corresponding category-specific object queries before inputting them into the decoder module for prediction. We further employ an additional k-category-specific Hungarian matching mechanism to conduct one-to-one matching on the predicted results. This process serves to enhance the semantic clarity of the object query categories.
  • Figure 4: (a)(b)(c) Comparing the structures of Look Forward Once, Look Forward Twice, and Look Forward Densely. (d) APs of look forward once, look forward twice, and look forward densely in each decoder layer. 'LFO', 'LFT', and 'LFD' are the corresponding abbreviations.
  • Figure 5: (a) The AP curve of DINO DINO and AO-DETR on PIXray PIXray dataset. (b) The loss convergence curve of DINO and AO-DETR on PIXray PIXray dataset.
  • ...and 3 more figures