VelLMes: A high-interaction AI-based deception framework

Muris Sladić; Veronica Valeros; Carlos Catania; Sebastian Garcia

VelLMes: A high-interaction AI-based deception framework

Muris Sladić, Veronica Valeros, Carlos Catania, Sebastian Garcia

TL;DR

VelLMes presents an AI-driven deception framework that uses per-service LLM prompts to simulate SSH Linux shells, MySQL, POP3, and HTTP as interactive honeypots. It validates generative realism with unit tests, assesses deception with 89 human attackers, and demonstrates real-world viability by deploying 10 internet-facing shelLM instances. Results show substantial attacker confusion (around 30%) and high command-reply accuracy (>~90%) in live attacks, indicating LLM honeypots can plausibly masquerade as real services and engage attackers. The work contributes an open-source framework, extensive evaluation methodology, and a path toward richer, multi-service cyber-deception with future improvements in realism and protocol coverage.

Abstract

There are very few SotA deception systems based on Large Language Models. The existing ones are limited only to simulating one type of service, mainly SSH shells. These systems - but also the deception technologies not based on LLMs - lack an extensive evaluation that includes human attackers. Generative AI has recently become a valuable asset for cybersecurity researchers and practitioners, and the field of cyber-deception is no exception. Researchers have demonstrated how LLMs can be leveraged to create realistic-looking honeytokens, fake users, and even simulated systems that can be used as honeypots. This paper presents an AI-based deception framework called VelLMes, which can simulate multiple protocols and services such as SSH Linux shell, MySQL, POP3, and HTTP. All of these can be deployed and used as honeypots, thus VelLMes offers a variety of choices for deception design based on the users' needs. VelLMes is designed to be attacked by humans, so interactivity and realism are key for its performance. We evaluate the generative capabilities and the deception capabilities. Generative capabilities were evaluated using unit tests for LLMs. The results of the unit tests show that, with careful prompting, LLMs can produce realistic-looking responses, with some LLMs having a 100% passing rate. In the case of the SSH Linux shell, we evaluated deception capabilities with 89 human attackers. The results showed that about 30% of the attackers thought that they were interacting with a real system when they were assigned an LLM-based honeypot. Lastly, we deployed 10 instances of the SSH Linux shell honeypot on the Internet to capture real-life attacks. Analysis of these attacks showed us that LLM honeypots simulating Linux shells can perform well against unstructured and unexpected attacks on the Internet, responding correctly to most of the issued commands.

VelLMes: A high-interaction AI-based deception framework

TL;DR

Abstract

VelLMes: A high-interaction AI-based deception framework

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)