Table of Contents
Fetching ...

Adversarially-Aware Architecture Design for Robust Medical AI Systems

Alyssa Gerhart, Balaji Iyangar

TL;DR

The paper examines adversarial vulnerabilities in medical AI, focusing on dermatology, by characterizing threat types, conducting a data-poisoning–oriented empirical study on the ISIC dataset using Nightshade, and evaluating defenses such as adversarial training, defensive distillation, and hybrids. It demonstrates that defenses can reduce attack success rates, but often at the cost of degraded performance on clean data, with the hybrid approach offering the best overall balance. The work highlights significant clinical and ethical risks from adversarial manipulation and argues for integrated technical, ethical, and policy-based safeguards. Finally, it calls for security-centric evaluation, standardized robustness benchmarks, and future work spanning multiple modalities and privacy-preserving paradigms to advance trustworthy, equitable medical AI.

Abstract

Adversarial attacks pose a severe risk to AI systems used in healthcare, capable of misleading models into dangerous misclassifications that can delay treatments or cause misdiagnoses. These attacks, often imperceptible to human perception, threaten patient safety, particularly in underserved populations. Our study explores these vulnerabilities through empirical experimentation on a dermatological dataset, where adversarial methods significantly reduce classification accuracy. Through detailed threat modeling, experimental benchmarking, and model evaluation, we demonstrate both the severity of the threat and the partial success of defenses like adversarial training and distillation. Our results show that while defenses reduce attack success rates, they must be balanced against model performance on clean data. We conclude with a call for integrated technical, ethical, and policy-based approaches to build more resilient, equitable AI in healthcare.

Adversarially-Aware Architecture Design for Robust Medical AI Systems

TL;DR

The paper examines adversarial vulnerabilities in medical AI, focusing on dermatology, by characterizing threat types, conducting a data-poisoning–oriented empirical study on the ISIC dataset using Nightshade, and evaluating defenses such as adversarial training, defensive distillation, and hybrids. It demonstrates that defenses can reduce attack success rates, but often at the cost of degraded performance on clean data, with the hybrid approach offering the best overall balance. The work highlights significant clinical and ethical risks from adversarial manipulation and argues for integrated technical, ethical, and policy-based safeguards. Finally, it calls for security-centric evaluation, standardized robustness benchmarks, and future work spanning multiple modalities and privacy-preserving paradigms to advance trustworthy, equitable medical AI.

Abstract

Adversarial attacks pose a severe risk to AI systems used in healthcare, capable of misleading models into dangerous misclassifications that can delay treatments or cause misdiagnoses. These attacks, often imperceptible to human perception, threaten patient safety, particularly in underserved populations. Our study explores these vulnerabilities through empirical experimentation on a dermatological dataset, where adversarial methods significantly reduce classification accuracy. Through detailed threat modeling, experimental benchmarking, and model evaluation, we demonstrate both the severity of the threat and the partial success of defenses like adversarial training and distillation. Our results show that while defenses reduce attack success rates, they must be balanced against model performance on clean data. We conclude with a call for integrated technical, ethical, and policy-based approaches to build more resilient, equitable AI in healthcare.

Paper Structure

This paper contains 19 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Illustration of a poisoning attack: (a) clean image, (b) poisoned variant, and (c) anchor causing the model to misclassify.
  • Figure 2: Precision evaluates model exactness in positive predictions. Higher values (blue) show the clean model correctly identifies true cases with minimal false positives, while poisoned performance (red) demonstrates vulnerability to adversarial attacks.
  • Figure 4: Recall evaluates model completeness in finding positive cases. The clean model (blue) shows strong detection rates, particularly for class 5, while poisoned recall (red) reveals critical failures to detect true cases under attack.
  • Figure 6: F1-Scores are a harmonic mean of precision and recall that balances both metrics. $\frac{2\times P\times R}{P+R}$a
  • Figure 7: Comparative performance of defense strategies showing tradeoffs between accuracy and robustness.