Table of Contents
Fetching ...

CoRPA: Adversarial Image Generation for Chest X-rays Using Concept Vector Perturbations and Generative Models

Amy Rafferty, Rishi Ramaesh, Ajitha Rajan

TL;DR

The paper addresses robustness gaps in AI-assisted radiology by introducing CoRPA, a clinically grounded black-box adversarial attack that perturbs clinical concepts within radiology reports and uses a text-to-image diffusion model to synthesize adversarial chest X-rays. By labeling MIMIC-CXR-JPG with 17 clinical concepts and evaluating seven backbone architectures, the authors show CoRPA reveals vulnerabilities not exposed by standard attacks, particularly for outer-class perturbations that introduce features from a second pathology. The findings emphasize the need for domain-aware robustness testing and potential defenses, such as adversarial training with CoRPA-generated data, to ensure safe deployment of medical AI in high-stakes settings. The approach also provides a foundation for extending clinically focused adversarial evaluation to other medical imaging modalities and tasks.

Abstract

Deep learning models for medical image classification tasks are becoming widely implemented in AI-assisted diagnostic tools, aiming to enhance diagnostic accuracy, reduce clinician workloads, and improve patient outcomes. However, their vulnerability to adversarial attacks poses significant risks to patient safety. Current attack methodologies use general techniques such as model querying or pixel value perturbations to generate adversarial examples designed to fool a model. These approaches may not adequately address the unique characteristics of clinical errors stemming from missed or incorrectly identified clinical features. We propose the Concept-based Report Perturbation Attack (CoRPA), a clinically-focused black-box adversarial attack framework tailored to the medical imaging domain. CoRPA leverages clinical concepts to generate adversarial radiological reports and images that closely mirror realistic clinical misdiagnosis scenarios. We demonstrate the utility of CoRPA using the MIMIC-CXR-JPG dataset of chest X-rays and radiological reports. Our evaluation reveals that deep learning models exhibiting strong resilience to conventional adversarial attacks are significantly less robust when subjected to CoRPA's clinically-focused perturbations. This underscores the importance of addressing domain-specific vulnerabilities in medical AI systems. By introducing a specialized adversarial attack framework, this study provides a foundation for developing robust, real-world-ready AI models in healthcare, ensuring their safe and reliable deployment in high-stakes clinical environments.

CoRPA: Adversarial Image Generation for Chest X-rays Using Concept Vector Perturbations and Generative Models

TL;DR

The paper addresses robustness gaps in AI-assisted radiology by introducing CoRPA, a clinically grounded black-box adversarial attack that perturbs clinical concepts within radiology reports and uses a text-to-image diffusion model to synthesize adversarial chest X-rays. By labeling MIMIC-CXR-JPG with 17 clinical concepts and evaluating seven backbone architectures, the authors show CoRPA reveals vulnerabilities not exposed by standard attacks, particularly for outer-class perturbations that introduce features from a second pathology. The findings emphasize the need for domain-aware robustness testing and potential defenses, such as adversarial training with CoRPA-generated data, to ensure safe deployment of medical AI in high-stakes settings. The approach also provides a foundation for extending clinically focused adversarial evaluation to other medical imaging modalities and tasks.

Abstract

Deep learning models for medical image classification tasks are becoming widely implemented in AI-assisted diagnostic tools, aiming to enhance diagnostic accuracy, reduce clinician workloads, and improve patient outcomes. However, their vulnerability to adversarial attacks poses significant risks to patient safety. Current attack methodologies use general techniques such as model querying or pixel value perturbations to generate adversarial examples designed to fool a model. These approaches may not adequately address the unique characteristics of clinical errors stemming from missed or incorrectly identified clinical features. We propose the Concept-based Report Perturbation Attack (CoRPA), a clinically-focused black-box adversarial attack framework tailored to the medical imaging domain. CoRPA leverages clinical concepts to generate adversarial radiological reports and images that closely mirror realistic clinical misdiagnosis scenarios. We demonstrate the utility of CoRPA using the MIMIC-CXR-JPG dataset of chest X-rays and radiological reports. Our evaluation reveals that deep learning models exhibiting strong resilience to conventional adversarial attacks are significantly less robust when subjected to CoRPA's clinically-focused perturbations. This underscores the importance of addressing domain-specific vulnerabilities in medical AI systems. By introducing a specialized adversarial attack framework, this study provides a foundation for developing robust, real-world-ready AI models in healthcare, ensuring their safe and reliable deployment in high-stakes clinical environments.

Paper Structure

This paper contains 17 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Example of our labelling approach outperforming CheXpert. The phrases in red were used by CheXpert, which incorrectly labelled this report as Pneumonia. Our approach accounts for the negative mentions and context, and based on the Hilar adenopathy (green), labels the report as Cancer.
  • Figure 2: Visualisation of the CoRPA pipeline. A chest X-ray and corresponding report are used to generate a concept vector. Green-highlighted phrases are used by the algorithm (see Table \ref{['tab:concepts']}). The image-report pair is labelled as Pleural Effusion as the Effusion and Fluid concepts are present. The concept vector is perturbed four times - two inter-class and two outer-class perturbations. We show one perturbation of each type for visibility. Adversarial reports are generated through sentence manipulation based on these perturbed vectors. Sentences related to removed concepts (Fluid in the inter-class example) are removed from the report. A new sentence for the added concepts (Meniscus Sign in the inter-class example, and Infection in the outer-class example) is inserted (yellow). Reports are then input into a text-to-image Stable Diffusion model to produce an adversarial image.
  • Figure 3: Visualization of an inter- and outer-class perturbation for a cancerous concept vector. For inter-class perturbations, only the concepts relating to the original class (green) are perturbed. For outer-class perturbations, concepts relating to the original class (green) remain the same, and a random perturbation of the concepts relating to the randomly selected new class (red) is generated.
  • Figure 4: Visualisation of the re-generation of adversarial radiological reports from perturbed concept vectors through sentence manipulation, for an inter-class perturbation example. The sentences contributing to the original concept Fluid which is not present in the perturbed target vector (green) are removed from the text. Note this only applies to sentences within the 'cleaned' list, as defined in Section \ref{['labelling']}. A random sentence corresponding to the new concept Meniscus Sign in the target concept vector is then chosen from the concept-to-sentence mapping (red), and inserted into the report. The sentence corresponding to the Healthy class is ignored due to the presence of pathology-indicative concepts.
  • Figure 5: Attack Success Rate (ASR) curves for FGSM, PGD, and SimBA across all tested models. The legends display the Area Under the Curve (AUC) values, where higher AUCs correspond to steeper curves, indicating lower robustness of the models to the attack.