Visual inspection for illicit items in X-ray images using Deep Learning
Ioannis Mademlis, Georgios Batsis, Adamantia Anna Rebolledo Chrysochoou, Georgios Th. Papadopoulos
TL;DR
This paper tackles the challenge of automatically detecting illicit items in high-throughput X-ray scans by performing a rigorous, common-evaluation study of DNN components. It systematically compares detection heads, backbones, and domain-specific auxiliary modules on the SIXray dataset, using a defined incremental methodology. The results show Transformer-based detectors, particularly DINO, outperform CNN-based heads, with the CSP-DarkNet-53 backbone delivering strong efficiency; many domain-specific modules offer little or negative benefit when paired with modern detectors. These insights provide practical guidance for deploying real-time X-ray screening systems and establish a protocol for future cross-method comparisons in this domain.
Abstract
Automated detection of contraband items in X-ray images can significantly increase public safety, by enhancing the productivity and alleviating the mental load of security officers in airports, subways, customs/post offices, etc. The large volume and high throughput of passengers, mailed parcels, etc., during rush hours practically make it a Big Data problem. Modern computer vision algorithms relying on Deep Neural Networks (DNNs) have proven capable of undertaking this task even under resource-constrained and embedded execution scenarios, e.g., as is the case with fast, single-stage object detectors. However, no comparative experimental assessment of the various relevant DNN components/methods has been performed under a common evaluation protocol, which means that reliable cross-method comparisons are missing. This paper presents exactly such a comparative assessment, utilizing a public relevant dataset and a well-defined methodology for selecting the specific DNN components/modules that are being evaluated. The results indicate the superiority of Transformer detectors, the obsolete nature of auxiliary neural modules that have been developed in the past few years for security applications and the efficiency of the CSP-DarkNet backbone CNN.
