Table of Contents
Fetching ...

Imitation Game for Adversarial Disillusion with Multimodal Generative Chain-of-Thought Role-Play

Ching-Chun Chang, Fan-Yun Chen, Shih-Hong Gu, Kai Gao, Hanrui Wang, Isao Echizen

TL;DR

This work tackles adversarial illusions that threaten machine perception by introducing a universal defence based on an imitation-game with a multimodal generative agent guided by chain-of-thought prompts. The method reconstructs benign semantics from adversarial inputs, using $\tilde{x} = G(x', \pi)$ to satisfy $f(\tilde{x}) = y$ rather than preserving perceptual similarity to the original. It formalizes the problem with $x' = x + \delta + \iota$ and demonstrates, on a visual recognition task with a ViT backbone and a ChatGPT–DALL·E imitator, that the imitation-based defence improves robustness against both deductive and inductive attacks while maintaining high accuracy under no-attack conditions. While promising, the study notes limitations in emulating completely unknown objects and acknowledges evolving AI governance considerations.

Abstract

As the cornerstone of artificial intelligence, machine perception confronts a fundamental threat posed by adversarial illusions. These adversarial attacks manifest in two primary forms: deductive illusion, where specific stimuli are crafted based on the victim model's general decision logic, and inductive illusion, where the victim model's general decision logic is shaped by specific stimuli. The former exploits the model's decision boundaries to create a stimulus that, when applied, interferes with its decision-making process. The latter reinforces a conditioned reflex in the model, embedding a backdoor during its learning phase that, when triggered by a stimulus, causes aberrant behaviours. The multifaceted nature of adversarial illusions calls for a unified defence framework, addressing vulnerabilities across various forms of attack. In this study, we propose a disillusion paradigm based on the concept of an imitation game. At the heart of the imitation game lies a multimodal generative agent, steered by chain-of-thought reasoning, which observes, internalises and reconstructs the semantic essence of a sample, liberated from the classic pursuit of reversing the sample to its original state. As a proof of concept, we conduct experimental simulations using a multimodal generative dialogue agent and evaluates the methodology under a variety of attack scenarios.

Imitation Game for Adversarial Disillusion with Multimodal Generative Chain-of-Thought Role-Play

TL;DR

This work tackles adversarial illusions that threaten machine perception by introducing a universal defence based on an imitation-game with a multimodal generative agent guided by chain-of-thought prompts. The method reconstructs benign semantics from adversarial inputs, using to satisfy rather than preserving perceptual similarity to the original. It formalizes the problem with and demonstrates, on a visual recognition task with a ViT backbone and a ChatGPT–DALL·E imitator, that the imitation-based defence improves robustness against both deductive and inductive attacks while maintaining high accuracy under no-attack conditions. While promising, the study notes limitations in emulating completely unknown objects and acknowledges evolving AI governance considerations.

Abstract

As the cornerstone of artificial intelligence, machine perception confronts a fundamental threat posed by adversarial illusions. These adversarial attacks manifest in two primary forms: deductive illusion, where specific stimuli are crafted based on the victim model's general decision logic, and inductive illusion, where the victim model's general decision logic is shaped by specific stimuli. The former exploits the model's decision boundaries to create a stimulus that, when applied, interferes with its decision-making process. The latter reinforces a conditioned reflex in the model, embedding a backdoor during its learning phase that, when triggered by a stimulus, causes aberrant behaviours. The multifaceted nature of adversarial illusions calls for a unified defence framework, addressing vulnerabilities across various forms of attack. In this study, we propose a disillusion paradigm based on the concept of an imitation game. At the heart of the imitation game lies a multimodal generative agent, steered by chain-of-thought reasoning, which observes, internalises and reconstructs the semantic essence of a sample, liberated from the classic pursuit of reversing the sample to its original state. As a proof of concept, we conduct experimental simulations using a multimodal generative dialogue agent and evaluates the methodology under a variety of attack scenarios.

Paper Structure

This paper contains 12 sections, 21 equations, 5 figures.

Figures (5)

  • Figure 1: Overview of an imitation game played by multimodal generative AI for shattering illusions induced by deductive and inductive illusory stimuli.
  • Figure 2: Visual comparison between original images (top row) and imitative images (bottom row) across various object classes.
  • Figure 3: Visual comparison of multiple defence methods (rows) against multiple attack methods (columns).
  • Figure 4: Accuracy of the benign classifier under various non-targeted attack methods, evaluated with various defence methods.
  • Figure 5: Accuracy of the malicious classifier under various targeted attack methods, evaluated with various defence methods.