Table of Contents
Fetching ...

LLM-based Multi-class Attack Analysis and Mitigation Framework in IoT/IIoT Networks

Seif Ikbarieh, Maanak Gupta, Elmahedi Mahalal

TL;DR

The paper tackles the lack of quantitative benchmarks for AI-driven IoT security analyses by proposing a hybrid framework that combines ML-based attack detection with LLM-driven attack behavior analysis and mitigation suggestions. It benchmarks multiple ML/DL classifiers on Edge-IIoTset and CICIoT2023 to choose the best detector (Random Forest) and uses Retrieval-Augmented Generation with prompt engineering to ground LLM analyses in attack and device context. An ensemble of judge LLMs plus human experts provides objective scoring across four evaluation dimensions, enabling quantitative comparison of LLMs like ChatGPT-o3 and DeepSeek-R1. The results show that RF offers superior detection performance and ChatGPT-o3 provides more accurate, practical analyses and mitigations across 13 attack types, highlighting the potential for scalable, grounded, AI-assisted IoT security.

Abstract

The Internet of Things has expanded rapidly, transforming communication and operations across industries but also increasing the attack surface and security breaches. Artificial Intelligence plays a key role in securing IoT, enabling attack detection, attack behavior analysis, and mitigation suggestion. Despite advancements, evaluations remain purely qualitative, and the lack of a standardized, objective benchmark for quantitatively measuring AI-based attack analysis and mitigation hinders consistent assessment of model effectiveness. In this work, we propose a hybrid framework combining Machine Learning (ML) for multi-class attack detection with Large Language Models (LLMs) for attack behavior analysis and mitigation suggestion. After benchmarking several ML and Deep Learning (DL) classifiers on the Edge-IIoTset and CICIoT2023 datasets, we applied structured role-play prompt engineering with Retrieval-Augmented Generation (RAG) to guide ChatGPT-o3 and DeepSeek-R1 in producing detailed, context-aware responses. We introduce novel evaluation metrics for quantitative assessment to guide us and an ensemble of judge LLMs, namely ChatGPT-4o, DeepSeek-V3, Mixtral 8x7B Instruct, Gemini 2.5 Flash, Meta Llama 4, TII Falcon H1 34B Instruct, xAI Grok 3, and Claude 4 Sonnet, to independently evaluate the responses. Results show that Random Forest has the best detection model, and ChatGPT-o3 outperformed DeepSeek-R1 in attack analysis and mitigation.

LLM-based Multi-class Attack Analysis and Mitigation Framework in IoT/IIoT Networks

TL;DR

The paper tackles the lack of quantitative benchmarks for AI-driven IoT security analyses by proposing a hybrid framework that combines ML-based attack detection with LLM-driven attack behavior analysis and mitigation suggestions. It benchmarks multiple ML/DL classifiers on Edge-IIoTset and CICIoT2023 to choose the best detector (Random Forest) and uses Retrieval-Augmented Generation with prompt engineering to ground LLM analyses in attack and device context. An ensemble of judge LLMs plus human experts provides objective scoring across four evaluation dimensions, enabling quantitative comparison of LLMs like ChatGPT-o3 and DeepSeek-R1. The results show that RF offers superior detection performance and ChatGPT-o3 provides more accurate, practical analyses and mitigations across 13 attack types, highlighting the potential for scalable, grounded, AI-assisted IoT security.

Abstract

The Internet of Things has expanded rapidly, transforming communication and operations across industries but also increasing the attack surface and security breaches. Artificial Intelligence plays a key role in securing IoT, enabling attack detection, attack behavior analysis, and mitigation suggestion. Despite advancements, evaluations remain purely qualitative, and the lack of a standardized, objective benchmark for quantitatively measuring AI-based attack analysis and mitigation hinders consistent assessment of model effectiveness. In this work, we propose a hybrid framework combining Machine Learning (ML) for multi-class attack detection with Large Language Models (LLMs) for attack behavior analysis and mitigation suggestion. After benchmarking several ML and Deep Learning (DL) classifiers on the Edge-IIoTset and CICIoT2023 datasets, we applied structured role-play prompt engineering with Retrieval-Augmented Generation (RAG) to guide ChatGPT-o3 and DeepSeek-R1 in producing detailed, context-aware responses. We introduce novel evaluation metrics for quantitative assessment to guide us and an ensemble of judge LLMs, namely ChatGPT-4o, DeepSeek-V3, Mixtral 8x7B Instruct, Gemini 2.5 Flash, Meta Llama 4, TII Falcon H1 34B Instruct, xAI Grok 3, and Claude 4 Sonnet, to independently evaluate the responses. Results show that Random Forest has the best detection model, and ChatGPT-o3 outperformed DeepSeek-R1 in attack analysis and mitigation.

Paper Structure

This paper contains 15 sections, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Overview of LLM-based Hybrid Framework
  • Figure 2: Edge-IIoTset Example Attack Scenario Prompt
  • Figure 3: Edge-IIoTset Example ChatGPT-o3 Response
  • Figure 4: Edge-IIoTset Example DeepSeek-R1 Response
  • Figure 5: An Example Evaluation Prompt
  • ...and 4 more figures