CogMorph: Cognitive Morphing Attacks for Text-to-Image Models

Zonglei Jing; Zonghao Ying; Le Wang; Siyuan Liang; Aishan Liu; Xianglong Liu; Dacheng Tao

CogMorph: Cognitive Morphing Attacks for Text-to-Image Models

Zonglei Jing, Zonghao Ying, Le Wang, Siyuan Liang, Aishan Liu, Xianglong Liu, Dacheng Tao

TL;DR

CogMorph introduces a cognitive morphing framework that manipulates prompts to inject toxic contextual elements into text-to-image outputs while preserving core subjects. It combines Cognitive Toxicity Augmentation (via Retrieval-Augmented Generation) with Contextual Hierarchical Morphing (hierarchical prompt parsing and feature fusion) under a detailed 10/48-category taxonomy and a 5-dimension toxicity risk matrix. The approach is validated across multiple open-source T2I models and DALL·E-3, outperforming baselines in toxicity escalation (TESR, ATI) and jailbreak success, with human studies corroborating increased harm and a robust, adaptable image-checking defense (A-VLIC). The work highlights a critical ethical risk in generative AI, provides a rich dataset of 1,176 prompts and 283 keywords, and suggests layered defenses and future directions for safer deployment of T2I systems.

Abstract

The development of text-to-image (T2I) generative models, that enable the creation of high-quality synthetic images from textual prompts, has opened new frontiers in creative design and content generation. However, this paper reveals a significant and previously unrecognized ethical risk inherent in this technology and introduces a novel method, termed the Cognitive Morphing Attack (CogMorph), which manipulates T2I models to generate images that retain the original core subjects but embeds toxic or harmful contextual elements. This nuanced manipulation exploits the cognitive principle that human perception of concepts is shaped by the entire visual scene and its context, producing images that amplify emotional harm far beyond attacks that merely preserve the original semantics. To address this, we first construct an imagery toxicity taxonomy spanning 10 major and 48 sub-categories, aligned with human cognitive-perceptual dimensions, and further build a toxicity risk matrix resulting in 1,176 high-quality T2I toxic prompts. Based on this, our CogMorph first introduces Cognitive Toxicity Augmentation, which develops a cognitive toxicity knowledge base with rich external toxic representations for humans (e.g., fine-grained visual features) that can be utilized to further guide the optimization of adversarial prompts. In addition, we present Contextual Hierarchical Morphing, which hierarchically extracts critical parts of the original prompt (e.g., scenes, subjects, and body parts), and then iteratively retrieves and fuses toxic features to inject harmful contexts. Extensive experiments on multiple open-sourced T2I models and black-box commercial APIs (e.g., DALLE-3) demonstrate the efficacy of CogMorph which significantly outperforms other baselines by large margins (+20.62% on average).

CogMorph: Cognitive Morphing Attacks for Text-to-Image Models

TL;DR

Abstract

Paper Structure (21 sections, 13 equations, 8 figures, 3 tables)

This paper contains 21 sections, 13 equations, 8 figures, 3 tables.

Introduction
Preliminaries and Backgrounds
Motivation and Objective
Motivation
Problem Definition
Threat Model
Dataset
Taxonomy
Risk Matrix
Dataset Generation
Approach
Cognitive Toxicity Augmentation
Contextual Hierarchical Morphing
Experiments and Evaluation
Experimental Setup
...and 6 more sections

Figures (8)

Figure 1: Illustration of our CogMorph attack. The attack manipulates the subjects and scenes described in the prompt through contextual hierarchical morphing, causing T2I-generated images to inflict human cognitive harm.
Figure 2: Image-oriented toxicity taxonomy.
Figure 3: Harmful categories and cognition dimensions toxicity risk assessment matrix.
Figure 4: Overview the CogMorph framework. Our approach begins with Cognitive Toxicity Augmentation, creating a knowledge base of external toxic representations to guide the optimization of adversarial prompts. Next, we introduce Contextual Hierarchical Morphing, which extracts key elements from the prompt and iteratively integrates toxic features to embed harmful contexts.
Figure 5: Performance of jailbreak attacks enhanced by CogMorph (*) on I2P dataset.
...and 3 more figures

CogMorph: Cognitive Morphing Attacks for Text-to-Image Models

TL;DR

Abstract

CogMorph: Cognitive Morphing Attacks for Text-to-Image Models

Authors

TL;DR

Abstract

Table of Contents

Figures (8)