Table of Contents
Fetching ...

Watertox: The Art of Simplicity in Universal Attacks A Cross-Model Framework for Robust Adversarial Generation

Zhenghao Gao, Shengjie Xu, Meixi Chen, Fangyao Zhao

TL;DR

Watertox tackles the challenge of cross-model transferability in adversarial attacks by combining a simple two-stage FGSM with an ensemble of diverse architectures and a voting-based aggregation. The method introduces a total loss $J_{total}$ over multiple surrogate models, a region-aware second stage, and a principled ensemble design to achieve robust, transferable perturbations, with theoretical guarantees on quality and transferability. Experimental results on ImageNet demonstrate strong base-model disruption (e.g., ConvNeXt-large from 70.6% to 16.0%) and impressive zero-shot transfer (up to 98.8% accuracy reduction on unseen architectures), outperforming NI-FGSM while preserving perceptual quality. The work offers practical implications for visual security and CAPTCHA generation, and suggests avenues for extending to other visual tasks and deeper theoretical understanding of architectural complementarity.

Abstract

Contemporary adversarial attack methods face significant limitations in cross-model transferability and practical applicability. We present Watertox, an elegant adversarial attack framework achieving remarkable effectiveness through architectural diversity and precision-controlled perturbations. Our two-stage Fast Gradient Sign Method combines uniform baseline perturbations ($ε_1 = 0.1$) with targeted enhancements ($ε_2 = 0.4$). The framework leverages an ensemble of complementary architectures, from VGG to ConvNeXt, synthesizing diverse perspectives through an innovative voting mechanism. Against state-of-the-art architectures, Watertox reduces model accuracy from 70.6% to 16.0%, with zero-shot attacks achieving up to 98.8% accuracy reduction against unseen architectures. These results establish Watertox as a significant advancement in adversarial methodologies, with promising applications in visual security systems and CAPTCHA generation.

Watertox: The Art of Simplicity in Universal Attacks A Cross-Model Framework for Robust Adversarial Generation

TL;DR

Watertox tackles the challenge of cross-model transferability in adversarial attacks by combining a simple two-stage FGSM with an ensemble of diverse architectures and a voting-based aggregation. The method introduces a total loss over multiple surrogate models, a region-aware second stage, and a principled ensemble design to achieve robust, transferable perturbations, with theoretical guarantees on quality and transferability. Experimental results on ImageNet demonstrate strong base-model disruption (e.g., ConvNeXt-large from 70.6% to 16.0%) and impressive zero-shot transfer (up to 98.8% accuracy reduction on unseen architectures), outperforming NI-FGSM while preserving perceptual quality. The work offers practical implications for visual security and CAPTCHA generation, and suggests avenues for extending to other visual tasks and deeper theoretical understanding of architectural complementarity.

Abstract

Contemporary adversarial attack methods face significant limitations in cross-model transferability and practical applicability. We present Watertox, an elegant adversarial attack framework achieving remarkable effectiveness through architectural diversity and precision-controlled perturbations. Our two-stage Fast Gradient Sign Method combines uniform baseline perturbations () with targeted enhancements (). The framework leverages an ensemble of complementary architectures, from VGG to ConvNeXt, synthesizing diverse perspectives through an innovative voting mechanism. Against state-of-the-art architectures, Watertox reduces model accuracy from 70.6% to 16.0%, with zero-shot attacks achieving up to 98.8% accuracy reduction against unseen architectures. These results establish Watertox as a significant advancement in adversarial methodologies, with promising applications in visual security systems and CAPTCHA generation.

Paper Structure

This paper contains 31 sections, 15 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Architectural overview of Watertox demonstrating the synergy of three primary components: (1) A strategically diverse model ensemble incorporating VGG, DenseNet, AlexNet, and ConvNeXt architectures; (2) A precision-controlled two-stage FGSM process combining baseline ($\epsilon_1 = 0.1$) and targeted ($\epsilon_2 = 0.4$) perturbations; and (3) A zero-shot adaptation mechanism ensuring cross-model transferability. This pipeline transforms input images into robust adversarial examples while maintaining perceptual quality.
  • Figure 2: A demonstration of Watertox effectiveness. The straightforward transformation of a great white shark image (left) leads to diverse and unexpected model interpretations (right), while maintaining clear human recognition.
  • Figure 3: Comparative analysis of model responses to original and Watertox-processed images. The matrix reveals how our straightforward method induces diverse model interpretations across different architectures.
  • Figure 4: Comparative analysis of attack success rates (ASR) and visual effects. (a) Three-dimensional visualization of ASR across different target architectures and attack methods. (b) Visual comparison of adversarial perturbations on a sample image using different approaches.