MMCL: Correcting Content Query Distributions for Improved Anti-Overlapping X-Ray Object Detection

Mingyuan Li; Tong Jia; Hui Lu; Hao Wang; Bowen Ma; Shiyi Guo; Shuyang Lin; Dongyue Chen; Haoran Wang; Baosheng Yu

MMCL: Correcting Content Query Distributions for Improved Anti-Overlapping X-Ray Object Detection

Mingyuan Li, Tong Jia, Hui Lu, Hao Wang, Bowen Ma, Shiyi Guo, Shuyang Lin, Dongyue Chen, Haoran Wang, Baosheng Yu

TL;DR

MMCL targets the core challenge of anti-overlapping X-ray object detection by correcting content query distributions in DETR-like detectors. It partitions decoder queries into $K$ class-specific groups and optimizes a dual-contrastive objective, Inter-class Moderate Exclusion ($L_{IME}$) and Intra-class Min-margin Clustering ($L_{IMC}$), with a tunable margin $m$ and weights $\gamma,\eta$. Across PIXray, OPIXray, and PIDray, and over four DETR variants with two backbones, MMCL delivers consistent improvements (e.g., up to $+3.8$ AP on PIXray and $+6.1$ mAP on OPIXray) and achieves state-of-the-art performance on OPIXray when combined with AO-DETR. The approach is computationally lightweight during training and incurs no inference-time cost, underscoring practical impact for real-time X-ray screening and related domains, while highlighting domain-specific benefits of correcting content-query priors in transformer-based detectors.

Abstract

Unlike natural images with occlusion-based overlap, X-ray images exhibit depth-induced superimposition and semi-transparent appearances, where objects at different depths overlap and their features blend together. These characteristics demand specialized mechanisms to disentangle mixed representations between target objects (e.g., prohibited items) and irrelevant backgrounds. While recent studies have explored adapting detection transformers (DETR) for anti-overlapping object detection, the importance of well-distributed content queries that represent object hypotheses remains underexplored. In this paper, we introduce a multi-class min-margin contrastive learning (MMCL) framework to correct the distribution of content queries, achieving balanced intra-class diversity and inter-class separability. The framework first groups content queries by object category and then applies two proposed complementary loss components: a multi-class exclusion loss to enhance inter-class separability, and a min-margin clustering loss to encourage intra-class diversity. We evaluate the proposed method on three widely used X-ray prohibited-item detection datasets, PIXray, OPIXray, and PIDray, using two backbone networks and four DETR variants. Experimental results demonstrate that MMCL effectively enhances anti-overlapping object detection and achieves state-of-the-art performance on both datasets. Code will be made publicly available on GitHub.

MMCL: Correcting Content Query Distributions for Improved Anti-Overlapping X-Ray Object Detection

TL;DR

MMCL targets the core challenge of anti-overlapping X-ray object detection by correcting content query distributions in DETR-like detectors. It partitions decoder queries into

class-specific groups and optimizes a dual-contrastive objective, Inter-class Moderate Exclusion (

) and Intra-class Min-margin Clustering (

), with a tunable margin

and weights

. Across PIXray, OPIXray, and PIDray, and over four DETR variants with two backbones, MMCL delivers consistent improvements (e.g., up to

AP on PIXray and

mAP on OPIXray) and achieves state-of-the-art performance on OPIXray when combined with AO-DETR. The approach is computationally lightweight during training and incurs no inference-time cost, underscoring practical impact for real-time X-ray screening and related domains, while highlighting domain-specific benefits of correcting content-query priors in transformer-based detectors.

Abstract

Paper Structure (20 sections, 9 equations, 9 figures, 9 tables, 1 algorithm)

This paper contains 20 sections, 9 equations, 9 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Prohibited Item Detection
Detection Transformers
Contrastive Learning
Method
Overview
Query Partition
Contrastive Loss
Discussion
Experiments
Datasets and Metrics
Implementation Details
Main Results
Ablation Studies
...and 5 more sections

Figures (9)

Figure 1: Illustration of different content query distributions for anti-overlapping object detection. Left: Non-clustered content queries Deformable-DETRRT-DETRDINO can recognize prohibited items (e.g., knives) only in simple overlapping scenes (Easy). Middle: Content queries with intra-class compactness AO-DETR become homogenized and can handle moderately complex overlapping scenes (Hard). Right: Content queries that maintain intra-class diversity and inter-class separability can effectively address complex and heavily overlapping scenes (Hidden).
Figure 2: Illustration of intra-class diversity in content queries with and without MMCL. Left: MMCL enhances the homogeneity coefficient in DINO (non-clustered) while reducing it in AO-DETR (over-compact). Right: The hyperparameter $m$ in the proposed loss function enables flexible control over intra-class margins. The homogeneity coefficient is the average cosine similarity among intra-class queries; a higher value indicates greater homogeneity and lower diversity.
Figure 3: Overview of the proposed MMCL framework for anti-overlapping X-ray object detection. The framework integrates a contrastive loss to refine the distribution of content queries, thereby enhancing object discrimination and reducing overlap confusion—all without modifying the underlying architecture.
Figure 4: Detailed illustration of the decoder’s content query mechanism in DINO DINO. After initializing the candidate boxes $R^0$ by the classification head, the regression head, and the query selection mechanism. Each decoder layer refines the content queries through self-attention and deformable attention, guided by classification head, regression head, and positional encoding mechanism DINO. The iterative update of content queries across layers enhances feature representation and detection accuracy. Among the inputs of the decoder, only content queries $\mathbf{Q}^0$ are initiated independent of input feature $X$, which directly determine the final prediction results. Inspired by this, we propose MMCL to optimize their priors.
Figure 5: Illustration of how the proposed contrastive loss adjusts the distribution of content queries. The loss simultaneously repels inter-class samples and attracts intra-class samples, promoting clearer class separation. Each sample attracts only those intra-class samples lying outside a defined minimum-margin radius, thereby maintaining appropriate intra-class diversity. Points of the same color denote samples from the same class, while points of different colors represent content queries from different classes.
...and 4 more figures

MMCL: Correcting Content Query Distributions for Improved Anti-Overlapping X-Ray Object Detection

TL;DR

Abstract

MMCL: Correcting Content Query Distributions for Improved Anti-Overlapping X-Ray Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (9)