Adversarial Robustness on Image Classification with $k$-means

Rollin Omari; Junae Kim; Paul Montague

Adversarial Robustness on Image Classification with $k$-means

Rollin Omari, Junae Kim, Paul Montague

TL;DR

The paper tackles adversarial vulnerabilities in unsupervised clustering by adapting adversarial training to $k$-means. It leverages transferability from a supervised surrogate model to generate adversarial examples for the unsupervised target, and introduces a continuous-learning training loop that alternates clean and perturbed samples while incrementally increasing attack strength and reinitializing centroids. Key findings include a robust trade-off between clean and adversarial accuracy, the importance of centroid initialization, and the effectiveness of a balanced clean/adversarial data mix ($\eta$ around 1/2 to 2/3); transferability across models and datasets demonstrates the broader applicability of the method. The work provides a practical baseline for unsupervised adversarial training and suggests pathways for extending robustness to other unsupervised techniques and real-world domains like healthcare and security.

Abstract

In this paper we explore the challenges and strategies for enhancing the robustness of $k$-means clustering algorithms against adversarial manipulations. We evaluate the vulnerability of clustering algorithms to adversarial attacks, emphasising the associated security risks. Our study investigates the impact of incremental attack strength on training, introduces the concept of transferability between supervised and unsupervised models, and highlights the sensitivity of unsupervised models to sample distributions. We additionally introduce and evaluate an adversarial training method that improves testing performance in adversarial scenarios, and we highlight the importance of various parameters in the proposed training method, such as continuous learning, centroid initialisation, and adversarial step-count.

Adversarial Robustness on Image Classification with $k$-means

TL;DR

The paper tackles adversarial vulnerabilities in unsupervised clustering by adapting adversarial training to

-means. It leverages transferability from a supervised surrogate model to generate adversarial examples for the unsupervised target, and introduces a continuous-learning training loop that alternates clean and perturbed samples while incrementally increasing attack strength and reinitializing centroids. Key findings include a robust trade-off between clean and adversarial accuracy, the importance of centroid initialization, and the effectiveness of a balanced clean/adversarial data mix (

around 1/2 to 2/3); transferability across models and datasets demonstrates the broader applicability of the method. The work provides a practical baseline for unsupervised adversarial training and suggests pathways for extending robustness to other unsupervised techniques and real-world domains like healthcare and security.

Abstract

In this paper we explore the challenges and strategies for enhancing the robustness of

-means clustering algorithms against adversarial manipulations. We evaluate the vulnerability of clustering algorithms to adversarial attacks, emphasising the associated security risks. Our study investigates the impact of incremental attack strength on training, introduces the concept of transferability between supervised and unsupervised models, and highlights the sensitivity of unsupervised models to sample distributions. We additionally introduce and evaluate an adversarial training method that improves testing performance in adversarial scenarios, and we highlight the importance of various parameters in the proposed training method, such as continuous learning, centroid initialisation, and adversarial step-count.

Paper Structure (13 sections, 2 equations, 3 figures, 1 algorithm)

This paper contains 13 sections, 2 equations, 3 figures, 1 algorithm.

Introduction
Background
Adversarial Examples
Adversarial Training
Methodology
Exploiting Transferability
Improving Robustness
Methods and Resources
Datasets
Adversarial Attacks
Implementation Details
Results and Discussion
Conclusion

Figures (3)

Figure 1: Collages of $8 \times 8$ randomly-picked images of handwritten digits and fashion items, respectively from the MNIST and Fashion-MNIST training datasets. Both datasets have a total of 70,000 samples with 60,000 images for training and 10,000 for testing. Collages (a) and (b) contain clean examples, while collages (c) and (d) contain adversarial examples of (a) and (b). For both (c) and (d), the adversarial examples are generated with I-FGSM.
Figure 2: Clean (ClnAcc) and adversarial (AdvAcc) clustering accuracies, on MNIST and Fashion-MNIST. For all the provided plots we have I-FGSM as the attack and adversarial training algorithm, the attack strength ($\epsilon$) used in training along the $x$-axis and the clustering accuracies ($\%$) along the $y$-axis. In each plot, the solid lines represent the average results from 30 experiments, while shaded areas illustrate the error bars for a confidence level of $99\%$. In the first column we have results for MNIST and Fashion-MNIST in the second. In (a) and (b) we have the full implementation of the proposed adversarial training algorithm. In (c) to (f) we have parameter sensitivity results. In (c) and (d) we have $k$-means trained in a similar manner as to that in (a) and (b), however without the initialisation of centroids from previous steps, i.e., no continuous learning. In (e) and (f) we have fully perturbed training sets as opposed to half of the training sets, i.e., $\eta=1$
Figure 3: Parameter sensitivity results on various adversarial to clean proportions, without incremental training, i.e., different values for $\eta$ and when adversarial step-count $\beta=1$. Along the $x$-axis we have proportion size $\eta$, used in controlling the ration between clean and adversarial data. Along the $y$-axis we have the clustering accuracies ($\%$). Each shaded bar illustrates average testing accuracies on MNIST and Fashion-MNIST after 30 experiments. Error bars are for a confidence level of $99\%$. I-FGSM is the attacking and defending algorithm. For all proportions, both training and testing attack strengths use $\epsilon = 1$.

Adversarial Robustness on Image Classification with $k$-means

TL;DR

Abstract

Adversarial Robustness on Image Classification with $k$-means

Authors

TL;DR

Abstract

Table of Contents

Figures (3)