Table of Contents
Fetching ...

Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment

Yaling Shen, Zhixiong Zhuang, Kun Yuan, Maria-Irina Nicolae, Nassir Navab, Nicolas Padoy, Mario Fritz

TL;DR

This work demonstrates the first data-free model-stealing attack against medical MLLMs for radiology report generation. The proposed ADA-Steal combines attacker-model training, medical report enrichment via an oracle LLM, and adversarial domain alignment to bridge the data distribution gap using natural images. Experiments on IU X-Ray and MIMIC-CXR show ADA-Steal substantially improves over baselines and approaches victim-model performance on linguistic and clinical-efficacy metrics, with ablations confirming the value of report enrichment and domain alignment. The findings reveal a practical IP protection risk for medical MLLMs and highlight the need for defenses that address cross-domain, high-dimensional text-generation threats in healthcare AI.

Abstract

Medical multimodal large language models (MLLMs) are becoming an instrumental part of healthcare systems, assisting medical personnel with decision making and results analysis. Models for radiology report generation are able to interpret medical imagery, thus reducing the workload of radiologists. As medical data is scarce and protected by privacy regulations, medical MLLMs represent valuable intellectual property. However, these assets are potentially vulnerable to model stealing, where attackers aim to replicate their functionality via black-box access. So far, model stealing for the medical domain has focused on classification; however, existing attacks are not effective against MLLMs. In this paper, we introduce Adversarial Domain Alignment (ADA-STEAL), the first stealing attack against medical MLLMs. ADA-STEAL relies on natural images, which are public and widely available, as opposed to their medical counterparts. We show that data augmentation with adversarial noise is sufficient to overcome the data distribution gap between natural images and the domain-specific distribution of the victim MLLM. Experiments on the IU X-RAY and MIMIC-CXR radiology datasets demonstrate that Adversarial Domain Alignment enables attackers to steal the medical MLLM without any access to medical data.

Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment

TL;DR

This work demonstrates the first data-free model-stealing attack against medical MLLMs for radiology report generation. The proposed ADA-Steal combines attacker-model training, medical report enrichment via an oracle LLM, and adversarial domain alignment to bridge the data distribution gap using natural images. Experiments on IU X-Ray and MIMIC-CXR show ADA-Steal substantially improves over baselines and approaches victim-model performance on linguistic and clinical-efficacy metrics, with ablations confirming the value of report enrichment and domain alignment. The findings reveal a practical IP protection risk for medical MLLMs and highlight the need for defenses that address cross-domain, high-dimensional text-generation threats in healthcare AI.

Abstract

Medical multimodal large language models (MLLMs) are becoming an instrumental part of healthcare systems, assisting medical personnel with decision making and results analysis. Models for radiology report generation are able to interpret medical imagery, thus reducing the workload of radiologists. As medical data is scarce and protected by privacy regulations, medical MLLMs represent valuable intellectual property. However, these assets are potentially vulnerable to model stealing, where attackers aim to replicate their functionality via black-box access. So far, model stealing for the medical domain has focused on classification; however, existing attacks are not effective against MLLMs. In this paper, we introduce Adversarial Domain Alignment (ADA-STEAL), the first stealing attack against medical MLLMs. ADA-STEAL relies on natural images, which are public and widely available, as opposed to their medical counterparts. We show that data augmentation with adversarial noise is sufficient to overcome the data distribution gap between natural images and the domain-specific distribution of the victim MLLM. Experiments on the IU X-RAY and MIMIC-CXR radiology datasets demonstrate that Adversarial Domain Alignment enables attackers to steal the medical MLLM without any access to medical data.

Paper Structure

This paper contains 38 sections, 7 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: ADA-Steal trains an attacker MLLM to replicate a victim MLLM for radiology report generation from natural images by first enriching the reports and then aligning the attacker distribution to the medical domain.
  • Figure 2: The overview of our proposed approach with three iterative phases: (I) attacker model training, (II) medical report enrichment (in gray dash box), and (III) domain alignment (in green dash box).
  • Figure 3: Stealing performance of the adversarial noise budget $\epsilon$.
  • Figure 4: Ablation performance of ADA-Steal without Oracle.
  • Figure 6: Qualitative evaluation by GPT-4 comparing the quality of test reports pairwise.
  • ...and 2 more figures