Adversarial Robustness on Image Classification with $k$-means
Rollin Omari, Junae Kim, Paul Montague
TL;DR
The paper tackles adversarial vulnerabilities in unsupervised clustering by adapting adversarial training to $k$-means. It leverages transferability from a supervised surrogate model to generate adversarial examples for the unsupervised target, and introduces a continuous-learning training loop that alternates clean and perturbed samples while incrementally increasing attack strength and reinitializing centroids. Key findings include a robust trade-off between clean and adversarial accuracy, the importance of centroid initialization, and the effectiveness of a balanced clean/adversarial data mix ($\eta$ around 1/2 to 2/3); transferability across models and datasets demonstrates the broader applicability of the method. The work provides a practical baseline for unsupervised adversarial training and suggests pathways for extending robustness to other unsupervised techniques and real-world domains like healthcare and security.
Abstract
In this paper we explore the challenges and strategies for enhancing the robustness of $k$-means clustering algorithms against adversarial manipulations. We evaluate the vulnerability of clustering algorithms to adversarial attacks, emphasising the associated security risks. Our study investigates the impact of incremental attack strength on training, introduces the concept of transferability between supervised and unsupervised models, and highlights the sensitivity of unsupervised models to sample distributions. We additionally introduce and evaluate an adversarial training method that improves testing performance in adversarial scenarios, and we highlight the importance of various parameters in the proposed training method, such as continuous learning, centroid initialisation, and adversarial step-count.
