Table of Contents
Fetching ...

Position-Guided Prompt Learning for Anomaly Detection in Chest X-Rays

Zhichao Sun, Yuliang Gu, Yepeng Liu, Zerui Zhang, Zhou Zhao, Yongchao Xu

TL;DR

This work tackles unsupervised anomaly detection in chest X-rays by leveraging a frozen CLIP backbone with position-guided prompt learning (PPAD) to bridge domain gaps between pretraining data and clinical images. It introduces learnable text and image prompts that focus on lung subregions and a structure-preserving anomaly synthesis (SAS) to generate authentic synthetic lesions during training. PPAD delivers state-of-the-art performance across ZhangLab, CheXpert, and VinDr-CXR datasets, outperforming both CLIP-based methods and other prompt-learning approaches, with strong ablations confirming the effectiveness of regional prompts and SAS. The approach is computationally efficient, requiring only a tiny fraction of learnable parameters and demonstrating practical potential for scalable, few-shot anomaly detection in medical imaging.

Abstract

Anomaly detection in chest X-rays is a critical task. Most methods mainly model the distribution of normal images, and then regard significant deviation from normal distribution as anomaly. Recently, CLIP-based methods, pre-trained on a large number of medical images, have shown impressive performance on zero/few-shot downstream tasks. In this paper, we aim to explore the potential of CLIP-based methods for anomaly detection in chest X-rays. Considering the discrepancy between the CLIP pre-training data and the task-specific data, we propose a position-guided prompt learning method. Specifically, inspired by the fact that experts diagnose chest X-rays by carefully examining distinct lung regions, we propose learnable position-guided text and image prompts to adapt the task data to the frozen pre-trained CLIP-based model. To enhance the model's discriminative capability, we propose a novel structure-preserving anomaly synthesis method within chest x-rays during the training process. Extensive experiments on three datasets demonstrate that our proposed method outperforms some state-of-the-art methods. The code of our implementation is available at https://github.com/sunzc-sunny/PPAD.

Position-Guided Prompt Learning for Anomaly Detection in Chest X-Rays

TL;DR

This work tackles unsupervised anomaly detection in chest X-rays by leveraging a frozen CLIP backbone with position-guided prompt learning (PPAD) to bridge domain gaps between pretraining data and clinical images. It introduces learnable text and image prompts that focus on lung subregions and a structure-preserving anomaly synthesis (SAS) to generate authentic synthetic lesions during training. PPAD delivers state-of-the-art performance across ZhangLab, CheXpert, and VinDr-CXR datasets, outperforming both CLIP-based methods and other prompt-learning approaches, with strong ablations confirming the effectiveness of regional prompts and SAS. The approach is computationally efficient, requiring only a tiny fraction of learnable parameters and demonstrating practical potential for scalable, few-shot anomaly detection in medical imaging.

Abstract

Anomaly detection in chest X-rays is a critical task. Most methods mainly model the distribution of normal images, and then regard significant deviation from normal distribution as anomaly. Recently, CLIP-based methods, pre-trained on a large number of medical images, have shown impressive performance on zero/few-shot downstream tasks. In this paper, we aim to explore the potential of CLIP-based methods for anomaly detection in chest X-rays. Considering the discrepancy between the CLIP pre-training data and the task-specific data, we propose a position-guided prompt learning method. Specifically, inspired by the fact that experts diagnose chest X-rays by carefully examining distinct lung regions, we propose learnable position-guided text and image prompts to adapt the task data to the frozen pre-trained CLIP-based model. To enhance the model's discriminative capability, we propose a novel structure-preserving anomaly synthesis method within chest x-rays during the training process. Extensive experiments on three datasets demonstrate that our proposed method outperforms some state-of-the-art methods. The code of our implementation is available at https://github.com/sunzc-sunny/PPAD.
Paper Structure (20 sections, 3 equations, 4 figures, 7 tables)

This paper contains 20 sections, 3 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: The pipeline of our proposed PPAD. The main idea of the PPAD is to adapt text data and image data using learnable prompts. The rest of the model is frozen. Four positional prompts are optional. Taking the "left lung" for example, PPAD incorporates "left lung" as the position prompt embeddings $E^{pos}_t$. The learnable text prompt $P_{t}$ is insert between position prompt embeddings $E^{pos}_t$ and class embeddings $E^{cls}_t$. Image input is either the synthetic anomaly or the normal image. The right lung region of the input image embedding $E_i$ is replaced by the learnable image prompt $P_{i}$.
  • Figure 2: The illustration of the proposed SAS. SAS applies distance transform within a random mask to create smoothed gamma values. Then, synthetic anomaly is generated via Gamma correction applied to the input normal image.
  • Figure 3: The visualization of CAM guided by the entire view (the second column) and various position prompts (last four columns).
  • Figure 4: Visual comparison of anomaly synthesis methods on ZhangLab dataset. For each group, the first row represents visualization performance, while the second row shows the masks of the corresponding anomalies. The proposed SAS is evidenced to generate authentic anomalies while preserving the structure of the lung.