Table of Contents
Fetching ...

DAGNet: A Dual-View Attention-Guided Network for Efficient X-ray Security Inspection

Shilong Hong, Yanzhou Zhou, Weichao Xu

TL;DR

DAGNet addresses the limitations of single-view X-ray security inspection by leveraging dual-view imagery. It introduces three synergistic modules—FDIM for frequency-domain feature enhancement, DVHEM for cross-view hierarchical feature alignment via cross-attention, and CGFM for efficient fusion with convolutional attention—that together reduce information loss and redundancy across views. Empirical results on the DvXray dataset across multiple backbones show consistent improvements in mean average precision (mAP) over baselines and prior dual-view methods, with ablation indicating complementary benefits from each module. The approach offers a practical, scalable solution for real-world security screening, supported by public code and strong generalization across architectures.

Abstract

With the rapid development of modern transportation systems and the exponential growth of logistics volumes, intelligent X-ray-based security inspection systems play a crucial role in public safety. Although single-view X-ray baggage scanner is widely deployed, they struggles to accurately identify contraband in complex stacking scenarios due to strong viewpoint dependency and inadequate feature representation. To address this, we propose a Dual-View Attention-Guided Network for Efficient X-ray Security Inspection (DAGNet). This study builds on a shared-weight backbone network as the foundation and constructs three key modules that work together: (1) Frequency Domain Interaction Module (FDIM) dynamically enhances features by adjusting frequency components based on inter-view relationships; (2) Dual-View Hierarchical Enhancement Module (DVHEM) employs cross-attention to align features between views and capture hierarchical associations; (3) Convolutional Guided Fusion Module (CGFM) fuses features to suppress redundancy while retaining critical discriminative information. Collectively, these modules substantially improve the performance of dual-view X-ray security inspection. Experimental results demonstrate that DAGNet outperforms existing state-of-the-art approaches across multiple backbone architectures. The code is available at:https://github.com/ShilongHong/DAGNet.

DAGNet: A Dual-View Attention-Guided Network for Efficient X-ray Security Inspection

TL;DR

DAGNet addresses the limitations of single-view X-ray security inspection by leveraging dual-view imagery. It introduces three synergistic modules—FDIM for frequency-domain feature enhancement, DVHEM for cross-view hierarchical feature alignment via cross-attention, and CGFM for efficient fusion with convolutional attention—that together reduce information loss and redundancy across views. Empirical results on the DvXray dataset across multiple backbones show consistent improvements in mean average precision (mAP) over baselines and prior dual-view methods, with ablation indicating complementary benefits from each module. The approach offers a practical, scalable solution for real-world security screening, supported by public code and strong generalization across architectures.

Abstract

With the rapid development of modern transportation systems and the exponential growth of logistics volumes, intelligent X-ray-based security inspection systems play a crucial role in public safety. Although single-view X-ray baggage scanner is widely deployed, they struggles to accurately identify contraband in complex stacking scenarios due to strong viewpoint dependency and inadequate feature representation. To address this, we propose a Dual-View Attention-Guided Network for Efficient X-ray Security Inspection (DAGNet). This study builds on a shared-weight backbone network as the foundation and constructs three key modules that work together: (1) Frequency Domain Interaction Module (FDIM) dynamically enhances features by adjusting frequency components based on inter-view relationships; (2) Dual-View Hierarchical Enhancement Module (DVHEM) employs cross-attention to align features between views and capture hierarchical associations; (3) Convolutional Guided Fusion Module (CGFM) fuses features to suppress redundancy while retaining critical discriminative information. Collectively, these modules substantially improve the performance of dual-view X-ray security inspection. Experimental results demonstrate that DAGNet outperforms existing state-of-the-art approaches across multiple backbone architectures. The code is available at:https://github.com/ShilongHong/DAGNet.

Paper Structure

This paper contains 18 sections, 22 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overall architecture of the proposed framework.
  • Figure 2: Overall architecture of the proposed framework.
  • Figure 3: The overall architecture of the proposed Frequency Domain Interaction Module (FDIM)
  • Figure 4: Overall architecture of the Dual-View Hierarchical Enhancement Module (DVHEM).
  • Figure 5: The overall architecture of the proposed Convolutional Attention Fusion Module (CGFM)