Table of Contents
Fetching ...

SoK: Honeypots & LLMs, More Than the Sum of Their Parts?

Robert A. Bridges, Thomas R. Mitchell, Mauricio Muñoz, Ted Henriksson

TL;DR

This SoK examines the intersection of honeypots and large language models, addressing the problem of achieving convincing deception without prohibitive risk. It synthesizes three strands—a taxonomy of honeypot detection vectors, emergent canonical architectures for LLM-powered honeypots, and the evolution of honeypot log analysis toward automated threat intelligence—into a cohesive framework and roadmap. Key findings include the persistence of fingerprinting weaknesses, a growing yet immature canonical architecture with modular components, and an evaluation paradigm hampered by a data desert and a lack of target humanoid adversaries, alongside nascent work on autonomous, real-time threat intelligence. The paper argues for open, modular tooling, autonomous feedback loops, and an adversarial research ecosystem to enable self-improving deception systems capable of countering intelligent automated attackers, with potential impact on SOC workflows and threat intelligence pipelines.

Abstract

The advent of Large Language Models (LLMs) promised to resolve the long-standing paradox in honeypot design, achieving high-fidelity deception with low operational risk. Through a flurry of research since late 2022, steady progress from ideation to prototype implementation is exhibited. Since late 2022, a flurry of research has demonstrated steady progress from ideation to prototype implementation. While promising, evaluations show only incremental progress in real-world deployments, and the field still lacks a cohesive understanding of the emerging architectural patterns, core challenges, and evaluation paradigms. To fill this gap, this Systematization of Knowledge (SoK) paper provides the first comprehensive overview and analysis of this new domain. We survey and systematize the field by focusing on three critical, intersecting research areas: first, we provide a taxonomy of honeypot detection vectors, structuring the core problems that LLM-based realism must solve; second, we synthesize the emerging literature on LLM-powered honeypots, identifying a canonical architecture and key evaluation trends; and third, we chart the evolutionary path of honeypot log analysis, from simple data reduction to automated intelligence generation. We synthesize these findings into a forward-looking research roadmap, arguing that the true potential of this technology lies in creating autonomous, self-improving deception systems to counter the emerging threat of intelligent, automated attackers.

SoK: Honeypots & LLMs, More Than the Sum of Their Parts?

TL;DR

This SoK examines the intersection of honeypots and large language models, addressing the problem of achieving convincing deception without prohibitive risk. It synthesizes three strands—a taxonomy of honeypot detection vectors, emergent canonical architectures for LLM-powered honeypots, and the evolution of honeypot log analysis toward automated threat intelligence—into a cohesive framework and roadmap. Key findings include the persistence of fingerprinting weaknesses, a growing yet immature canonical architecture with modular components, and an evaluation paradigm hampered by a data desert and a lack of target humanoid adversaries, alongside nascent work on autonomous, real-time threat intelligence. The paper argues for open, modular tooling, autonomous feedback loops, and an adversarial research ecosystem to enable self-improving deception systems capable of countering intelligent automated attackers, with potential impact on SOC workflows and threat intelligence pipelines.

Abstract

The advent of Large Language Models (LLMs) promised to resolve the long-standing paradox in honeypot design, achieving high-fidelity deception with low operational risk. Through a flurry of research since late 2022, steady progress from ideation to prototype implementation is exhibited. Since late 2022, a flurry of research has demonstrated steady progress from ideation to prototype implementation. While promising, evaluations show only incremental progress in real-world deployments, and the field still lacks a cohesive understanding of the emerging architectural patterns, core challenges, and evaluation paradigms. To fill this gap, this Systematization of Knowledge (SoK) paper provides the first comprehensive overview and analysis of this new domain. We survey and systematize the field by focusing on three critical, intersecting research areas: first, we provide a taxonomy of honeypot detection vectors, structuring the core problems that LLM-based realism must solve; second, we synthesize the emerging literature on LLM-powered honeypots, identifying a canonical architecture and key evaluation trends; and third, we chart the evolutionary path of honeypot log analysis, from simple data reduction to automated intelligence generation. We synthesize these findings into a forward-looking research roadmap, arguing that the true potential of this technology lies in creating autonomous, self-improving deception systems to counter the emerging threat of intelligent, automated attackers.

Paper Structure

This paper contains 44 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: The Canonical Architecture of an LLM-Powered Honeypot. This diagram synthesizes the architectural patterns that have emerged from recent literature. Components are color-coded, with blue nodes denoting core, needed components, and purple nodes advanced/optional components introduced in the literature. An Attacker connects to the Attacker-Facing Server, which mimics a network service (e.g., SSH). Each command is passed to a Filter/Router, a critical component that can prevent prompt injection, Denial-of-Service (DoS), and Denial-of-Wallet (DoW) attacks guan2024honeyllmWang2024johnson2024modular. The filter can pass the command to a Deterministic Responder for simple or cached responses or, for novel interactions, forward it toward an LLM ragsdale2023designingWang2024. The Prompt Creator component constructs a rich, context-aware prompt for the LLM. This can incorporate data from a System State Manager, which tracks changes to the virtual environment, and a Session History Curator, which intelligently prunes the command history to manage context length Wang2024. The LLM component itself may represent multiple models or fine-tuned LoRA heads, each specialized for a different protocol. All interactions are logged in the Honeypot Data Store (not depicted) for analysis.