Adversarially-Aware Architecture Design for Robust Medical AI Systems
Alyssa Gerhart, Balaji Iyangar
TL;DR
The paper examines adversarial vulnerabilities in medical AI, focusing on dermatology, by characterizing threat types, conducting a data-poisoning–oriented empirical study on the ISIC dataset using Nightshade, and evaluating defenses such as adversarial training, defensive distillation, and hybrids. It demonstrates that defenses can reduce attack success rates, but often at the cost of degraded performance on clean data, with the hybrid approach offering the best overall balance. The work highlights significant clinical and ethical risks from adversarial manipulation and argues for integrated technical, ethical, and policy-based safeguards. Finally, it calls for security-centric evaluation, standardized robustness benchmarks, and future work spanning multiple modalities and privacy-preserving paradigms to advance trustworthy, equitable medical AI.
Abstract
Adversarial attacks pose a severe risk to AI systems used in healthcare, capable of misleading models into dangerous misclassifications that can delay treatments or cause misdiagnoses. These attacks, often imperceptible to human perception, threaten patient safety, particularly in underserved populations. Our study explores these vulnerabilities through empirical experimentation on a dermatological dataset, where adversarial methods significantly reduce classification accuracy. Through detailed threat modeling, experimental benchmarking, and model evaluation, we demonstrate both the severity of the threat and the partial success of defenses like adversarial training and distillation. Our results show that while defenses reduce attack success rates, they must be balanced against model performance on clean data. We conclude with a call for integrated technical, ethical, and policy-based approaches to build more resilient, equitable AI in healthcare.
