HoneyGPT: Breaking the Trilemma in Terminal Honeypots with Large Language Model
Ziyang Wang, Jianzhou You, Haining Wang, Tianwei Yuan, Shichao Lv, Yang Wang, Limin Sun
TL;DR
HoneyGPT reframes terminal honeypots as text-based, LLM-driven interactions to overcome the traditional trilemma between flexibility, interaction depth, and deception. By coupling a Terminal Protocol Proxy with a Prompt Manager that employs Chain-of-Thought-based question enhancement and Memory Pruning, it sustains long, coherent attacker dialogues within context-length constraints. Empirical results show HoneyGPT (especially with GPT-4) outperforms Cowrie and real systems on deception, interaction, and configuration flexibility, and field tests reveal deeper attacker engagement and new attack vectors. This approach enables cost-effective, intelligent, and adaptive honeynets with integrated security analytics, advancing practical cyber defense and attacker understanding. The work highlights practical deployment considerations and outlines avenues for longer context models and refined memory management to further enhance realism and coverage of evolving threats.
Abstract
Honeypots, as a strategic cyber-deception mechanism designed to emulate authentic interactions and bait unauthorized entities, often struggle with balancing flexibility, interaction depth, and deception. They typically fail to adapt to evolving attacker tactics, with limited engagement and information gathering. Fortunately, the emergent capabilities of large language models and innovative prompt-based engineering offer a transformative shift in honeypot technologies. This paper introduces HoneyGPT, a pioneering shell honeypot architecture based on ChatGPT, characterized by its cost-effectiveness and proactive engagement. In particular, we propose a structured prompt engineering framework that incorporates chain-of-thought tactics to improve long-term memory and robust security analytics, enhancing deception and engagement. Our evaluation of HoneyGPT comprises a baseline comparison based on a collected dataset and a three-month field evaluation. The baseline comparison demonstrates HoneyGPT's remarkable ability to strike a balance among flexibility, interaction depth, and deceptive capability. The field evaluation further validates HoneyGPT's superior performance in engaging attackers more deeply and capturing a wider array of novel attack vectors.
