Table of Contents
Fetching ...

Enhancing Cybersecurity in Critical Infrastructure with LLM-Assisted Explainable IoT Systems

Ashutosh Ghimire, Ghazal Ghajari, Karma Gurung, Love K. Sah, Fathi Amsaad

TL;DR

This work tackles the challenge of securing IoT-enabled critical infrastructure by addressing data heterogeneity and the black-box nature of anomaly detectors. It introduces a hybrid framework that couples Autoencoder-based anomaly detection with LLM-assisted preprocessing and GPT-4–generated explanations, comparing PCA-based and LLM-driven pipelines. On the KDDCup99 10% corrected dataset, the LLM-assisted approach raises the macro F1 score from 0.49 to 0.98 and provides natural language explanations that contextualize anomalies. The results demonstrate improved accuracy, faster convergence, and enhanced interpretability, offering a practical path toward trustworthy, AI-driven IoT cybersecurity for critical infrastructure.

Abstract

Ensuring the security of critical infrastructure has become increasingly vital with the proliferation of Internet of Things (IoT) systems. However, the heterogeneous nature of IoT data and the lack of human-comprehensible insights from anomaly detection models remain significant challenges. This paper presents a hybrid framework that combines numerical anomaly detection using Autoencoders with Large Language Models (LLMs) for enhanced preprocessing and interpretability. Two preprocessing approaches are implemented: a traditional method utilizing Principal Component Analysis (PCA) to reduce dimensionality and an LLM-assisted method where GPT-4 dynamically recommends feature selection, transformation, and encoding strategies. Experimental results on the KDDCup99 10% corrected dataset demonstrate that the LLM-assisted preprocessing pipeline significantly improves anomaly detection performance. The macro-average F1 score increased from 0.49 in the traditional PCA-based approach to 0.98 with LLM-driven insights. Additionally, the LLM generates natural language explanations for detected anomalies, providing contextual insights into their causes and implications. This framework highlights the synergy between numerical AI models and LLMs, delivering an accurate, interpretable, and efficient solution for IoT cybersecurity in critical infrastructure.

Enhancing Cybersecurity in Critical Infrastructure with LLM-Assisted Explainable IoT Systems

TL;DR

This work tackles the challenge of securing IoT-enabled critical infrastructure by addressing data heterogeneity and the black-box nature of anomaly detectors. It introduces a hybrid framework that couples Autoencoder-based anomaly detection with LLM-assisted preprocessing and GPT-4–generated explanations, comparing PCA-based and LLM-driven pipelines. On the KDDCup99 10% corrected dataset, the LLM-assisted approach raises the macro F1 score from 0.49 to 0.98 and provides natural language explanations that contextualize anomalies. The results demonstrate improved accuracy, faster convergence, and enhanced interpretability, offering a practical path toward trustworthy, AI-driven IoT cybersecurity for critical infrastructure.

Abstract

Ensuring the security of critical infrastructure has become increasingly vital with the proliferation of Internet of Things (IoT) systems. However, the heterogeneous nature of IoT data and the lack of human-comprehensible insights from anomaly detection models remain significant challenges. This paper presents a hybrid framework that combines numerical anomaly detection using Autoencoders with Large Language Models (LLMs) for enhanced preprocessing and interpretability. Two preprocessing approaches are implemented: a traditional method utilizing Principal Component Analysis (PCA) to reduce dimensionality and an LLM-assisted method where GPT-4 dynamically recommends feature selection, transformation, and encoding strategies. Experimental results on the KDDCup99 10% corrected dataset demonstrate that the LLM-assisted preprocessing pipeline significantly improves anomaly detection performance. The macro-average F1 score increased from 0.49 in the traditional PCA-based approach to 0.98 with LLM-driven insights. Additionally, the LLM generates natural language explanations for detected anomalies, providing contextual insights into their causes and implications. This framework highlights the synergy between numerical AI models and LLMs, delivering an accurate, interpretable, and efficient solution for IoT cybersecurity in critical infrastructure.

Paper Structure

This paper contains 19 sections, 2 equations, 6 figures.

Figures (6)

  • Figure 1: Proposed framework for LLM-assisted anomaly detection in IoT systems.
  • Figure 2: Reconstruction error distribution of the LLM-integrated Autoencoder.
  • Figure 3: Reconstruction error distribution of the traditional Autoencoder.
  • Figure 4: Comparison of macro-average performance metrics between the traditional Autoencoder and the LLM-integrated Autoencoder.
  • Figure 5: Example 1: GPT-generated explanation of a detected anomaly.
  • ...and 1 more figures