Table of Contents
Fetching ...

LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems

Hakan T. Otal, M. Abdullah Canbaz

TL;DR

By fine-tuning a pre-trained open-source language model on a diverse dataset of attacker-generated commands and responses, this work developed a honeypot capable of sophisticated engagement with attackers.

Abstract

The rapid evolution of cyber threats necessitates innovative solutions for detecting and analyzing malicious activity. Honeypots, which are decoy systems designed to lure and interact with attackers, have emerged as a critical component in cybersecurity. In this paper, we present a novel approach to creating realistic and interactive honeypot systems using Large Language Models (LLMs). By fine-tuning a pre-trained open-source language model on a diverse dataset of attacker-generated commands and responses, we developed a honeypot capable of sophisticated engagement with attackers. Our methodology involved several key steps: data collection and processing, prompt engineering, model selection, and supervised fine-tuning to optimize the model's performance. Evaluation through similarity metrics and live deployment demonstrated that our approach effectively generates accurate and informative responses. The results highlight the potential of LLMs to revolutionize honeypot technology, providing cybersecurity professionals with a powerful tool to detect and analyze malicious activity, thereby enhancing overall security infrastructure.

LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems

TL;DR

By fine-tuning a pre-trained open-source language model on a diverse dataset of attacker-generated commands and responses, this work developed a honeypot capable of sophisticated engagement with attackers.

Abstract

The rapid evolution of cyber threats necessitates innovative solutions for detecting and analyzing malicious activity. Honeypots, which are decoy systems designed to lure and interact with attackers, have emerged as a critical component in cybersecurity. In this paper, we present a novel approach to creating realistic and interactive honeypot systems using Large Language Models (LLMs). By fine-tuning a pre-trained open-source language model on a diverse dataset of attacker-generated commands and responses, we developed a honeypot capable of sophisticated engagement with attackers. Our methodology involved several key steps: data collection and processing, prompt engineering, model selection, and supervised fine-tuning to optimize the model's performance. Evaluation through similarity metrics and live deployment demonstrated that our approach effectively generates accurate and informative responses. The results highlight the potential of LLMs to revolutionize honeypot technology, providing cybersecurity professionals with a powerful tool to detect and analyze malicious activity, thereby enhancing overall security infrastructure.
Paper Structure (12 sections, 5 figures, 1 table)

This paper contains 12 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Data Collection & Model Training Pipeline
  • Figure 2: Interactive LLM-Honeypot Server Framework
  • Figure 3: Example of Honeypot SSH Connection
  • Figure 4: Training losses over 36 steps in Supervised Fine-Tuning
  • Figure 5: Histogram of Cosine Similarity Scores over 140 Samples