Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data
Nahema Marchal, Rachel Xu, Rasmi Elasmar, Iason Gabriel, Beth Goldberg, William Isaac
TL;DR
This study builds a two-part taxonomy of GenAI misuse by integrating a literature review with a qualitative analysis of roughly 191–200 real-world incidents from January 2023 to March 2024, spanning text, image, audio, and video modalities. It distinguishes exploitation of GenAI capabilities from attacks on GenAI systems, identifying ten capability-exploitation tactics and eight (or nine) system-compromise tactics, with Impersonation, Appropriated Likeness, Sockpuppeting, and NCII emerging as dominant. The empirical findings reveal that most misuse leverages readily accessible capabilities to manipulate opinions, monetize content, or commit fraud, rather than sophisticated model attacks, while technical and social mitigations are both necessary. The authors discuss governance and safety implications, including prebunking and detection approaches, to curb rapid growth in AI-enabled deception and manipulation across domains. The work informs policy makers, safety teams, and researchers about current threat patterns and guides future evaluation and intervention strategies amid accelerating GenAI capabilities.
Abstract
Generative, multimodal artificial intelligence (GenAI) offers transformative potential across industries, but its misuse poses significant risks. Prior research has shed light on the potential of advanced AI systems to be exploited for malicious purposes. However, we still lack a concrete understanding of how GenAI models are specifically exploited or abused in practice, including the tactics employed to inflict harm. In this paper, we present a taxonomy of GenAI misuse tactics, informed by existing academic literature and a qualitative analysis of approximately 200 observed incidents of misuse reported between January 2023 and March 2024. Through this analysis, we illuminate key and novel patterns in misuse during this time period, including potential motivations, strategies, and how attackers leverage and abuse system capabilities across modalities (e.g. image, text, audio, video) in the wild.
