Table of Contents
Fetching ...

LLMPot: Dynamically Configured LLM-based Honeypot for Industrial Protocol and Physical Process Emulation

Christoforos Vasilatos, Dunia J. Mahboobeh, Hithem Lamri, Manaar Alam, Michail Maniatakos

TL;DR

LLMPot introduces a novel, open-source framework that leverages pre-trained LLMs to dynamically emulate ICS network protocols and associated physical processes, enabling realistic honeypot deployment with reduced manual effort. The approach uses ByT5 as a byte-level base model, a protocol-aware dataset-generation pipeline (via PLC probing and PCAP parsing), and iterative fine-tuning guided by novel metrics $BCA$, $RVA$, and $RVA-\epsilon$ to clone PLC behavior for Modbus and S7comm. Key contributions include a boundaries-based dataset generation method to avoid exhaustively sampling all protocol states, protocol-generalization across PLC brands, and demonstrations on both protocol and physical-process emulation including a desalination plant testbed. LLMPot achieves competitive realism, evidenced by high RVA, credible BCA performance, robust honeypot interactions, and real-time responsiveness, highlighting the practical impact of LLM-enabled ICS threat detection and analysis.

Abstract

Industrial Control Systems (ICS) are extensively used in critical infrastructures ensuring efficient, reliable, and continuous operations. However, their increasing connectivity and addition of advanced features make them vulnerable to cyber threats, potentially leading to severe disruptions in essential services. In this context, honeypots play a vital role by acting as decoy targets within ICS networks, or on the Internet, helping to detect, log, analyze, and develop mitigations for ICS-specific cyber threats. Deploying ICS honeypots, however, is challenging due to the necessity of accurately replicating industrial protocols and device characteristics, a crucial requirement for effectively mimicking the unique operational behavior of different industrial systems. Moreover, this challenge is compounded by the significant manual effort required in also mimicking the control logic the PLC would execute, in order to capture attacker traffic aiming to disrupt critical infrastructure operations. In this paper, we propose LLMPot, a novel approach for designing honeypots in ICS networks harnessing the potency of Large Language Models (LLMs). LLMPot aims to automate and optimize the creation of realistic honeypots with vendor-agnostic configurations, and for any control logic, aiming to eliminate the manual effort and specialized knowledge traditionally required in this domain. We conducted extensive experiments focusing on a wide array of parameters, demonstrating that our LLM-based approach can effectively create honeypot devices implementing different industrial protocols and diverse control logic.

LLMPot: Dynamically Configured LLM-based Honeypot for Industrial Protocol and Physical Process Emulation

TL;DR

LLMPot introduces a novel, open-source framework that leverages pre-trained LLMs to dynamically emulate ICS network protocols and associated physical processes, enabling realistic honeypot deployment with reduced manual effort. The approach uses ByT5 as a byte-level base model, a protocol-aware dataset-generation pipeline (via PLC probing and PCAP parsing), and iterative fine-tuning guided by novel metrics , , and to clone PLC behavior for Modbus and S7comm. Key contributions include a boundaries-based dataset generation method to avoid exhaustively sampling all protocol states, protocol-generalization across PLC brands, and demonstrations on both protocol and physical-process emulation including a desalination plant testbed. LLMPot achieves competitive realism, evidenced by high RVA, credible BCA performance, robust honeypot interactions, and real-time responsiveness, highlighting the practical impact of LLM-enabled ICS threat detection and analysis.

Abstract

Industrial Control Systems (ICS) are extensively used in critical infrastructures ensuring efficient, reliable, and continuous operations. However, their increasing connectivity and addition of advanced features make them vulnerable to cyber threats, potentially leading to severe disruptions in essential services. In this context, honeypots play a vital role by acting as decoy targets within ICS networks, or on the Internet, helping to detect, log, analyze, and develop mitigations for ICS-specific cyber threats. Deploying ICS honeypots, however, is challenging due to the necessity of accurately replicating industrial protocols and device characteristics, a crucial requirement for effectively mimicking the unique operational behavior of different industrial systems. Moreover, this challenge is compounded by the significant manual effort required in also mimicking the control logic the PLC would execute, in order to capture attacker traffic aiming to disrupt critical infrastructure operations. In this paper, we propose LLMPot, a novel approach for designing honeypots in ICS networks harnessing the potency of Large Language Models (LLMs). LLMPot aims to automate and optimize the creation of realistic honeypots with vendor-agnostic configurations, and for any control logic, aiming to eliminate the manual effort and specialized knowledge traditionally required in this domain. We conducted extensive experiments focusing on a wide array of parameters, demonstrating that our LLM-based approach can effectively create honeypot devices implementing different industrial protocols and diverse control logic.
Paper Structure (36 sections, 2 equations, 17 figures, 6 tables, 2 algorithms)

This paper contains 36 sections, 2 equations, 17 figures, 6 tables, 2 algorithms.

Figures (17)

  • Figure 1: LLMPot high-level diagram. 1. A client that automatically probes the PLC and captures responses. 2. Captured traffic forms a training dataset. 3. Fine-tuning of the LLM using the dataset. 4. Generated LLM-based honeypot with supportive components.
  • Figure 2: LLMPot's offline stage framework for dynamic configuration and PLC cloning. The process is fully automated, except for the box marked in the green ("protocol client"), which has to be performed manually for every new protocol.
  • Figure 3: BCA and RVA per epoch of the byt5-small model when using different input/output configurations ($2^2$: 4 up to $2^{16}$: 65536). Similar values on the metrics prove that LLMPot is independent of the PLC configuration.
  • Figure 4: PLCs used in our experiments, WAGO:Modbuswago (left) and SIEMENS:S7commsiemens (right), being copied by LLMPot.
  • Figure 5: BCA and RVA per epoch of the byt5-small model when using different dataset sizes and protocols to fine-tune. A patience value of 10 epochs was used to stop the fine-tuning in case the validation loss was not improving, thus the different ending epochs for each dataset size.
  • ...and 12 more figures