Image Scaling Attack Simulation: A Measure of Stealth and Detectability
Devon A. Kelly, Sarah A. Flanery, Christiana Chamon
TL;DR
This paper examines the real-world detectability of image scaling preprocessing attacks that compromise ML data pipelines. It uses a survey-based methodology with a ResNet50-based setup and a cats-vs-dogs dataset to gauge human awareness before and after being informed about the attack. Key findings show very low initial detection rates and substantial post-discovery ambiguity, with many participants unable to reliably distinguish attacked from benign images. The work underscores the practical risk of image scaling attacks in workplace and academic settings and highlights challenges for human-in-the-loop defenses and data sanitization strategies.
Abstract
Cybersecurity practices require effort to be maintained, and one weakness is a lack of awareness regarding potential attacks not only in the usage of machine learning models, but also in their development process. Previous studies have determined that preprocessing attacks, such as image scaling attacks, have been difficult to detect by humans (through visual response) and computers (through entropic algorithms). However, these studies fail to address the real-world performance and detectability of these attacks. The purpose of this work is to analyze the relationship between awareness of image scaling attacks with respect to demographic background and experience. We conduct a survey where we gather the subjects' demographics, analyze the subjects' experience in cybersecurity, record their responses to a poorly-performing convolutional neural network model that has been unknowingly hindered by an image scaling attack of a used dataset, and document their reactions after it is revealed that the images used within the broken models have been attacked. We find in this study that the overall detection rate of the attack is low enough to be viable in a workplace or academic setting, and even after discovery, subjects cannot conclusively determine benign images from attacked images.
