LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems

Hakan T. Otal; M. Abdullah Canbaz

LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems

Hakan T. Otal, M. Abdullah Canbaz

TL;DR

By fine-tuning a pre-trained open-source language model on a diverse dataset of attacker-generated commands and responses, this work developed a honeypot capable of sophisticated engagement with attackers.

Abstract

The rapid evolution of cyber threats necessitates innovative solutions for detecting and analyzing malicious activity. Honeypots, which are decoy systems designed to lure and interact with attackers, have emerged as a critical component in cybersecurity. In this paper, we present a novel approach to creating realistic and interactive honeypot systems using Large Language Models (LLMs). By fine-tuning a pre-trained open-source language model on a diverse dataset of attacker-generated commands and responses, we developed a honeypot capable of sophisticated engagement with attackers. Our methodology involved several key steps: data collection and processing, prompt engineering, model selection, and supervised fine-tuning to optimize the model's performance. Evaluation through similarity metrics and live deployment demonstrated that our approach effectively generates accurate and informative responses. The results highlight the potential of LLMs to revolutionize honeypot technology, providing cybersecurity professionals with a powerful tool to detect and analyze malicious activity, thereby enhancing overall security infrastructure.

LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems

TL;DR

Abstract

Paper Structure (12 sections, 5 figures, 1 table)

This paper contains 12 sections, 5 figures, 1 table.

Introduction
Methodology
Data Collection and Processing
Prompt Engineering
Model Selection
Supervised Fine-Tuning (SFT)
Experimental Results
Interactive LLM-Honeypot Framework
Custom SSH Server Wrapper
Training Loss Analysis
Similarity Analysis with Cowrie Outputs
Conclusion

Figures (5)

Figure 1: Data Collection & Model Training Pipeline
Figure 2: Interactive LLM-Honeypot Server Framework
Figure 3: Example of Honeypot SSH Connection
Figure 4: Training losses over 36 steps in Supervised Fine-Tuning
Figure 5: Histogram of Cosine Similarity Scores over 140 Samples

LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems

TL;DR

Abstract

LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (5)