Table of Contents
Fetching ...

Towards Explainable Network Intrusion Detection using Large Language Models

Paul R. B. Houssel, Priyanka Singh, Siamak Layeghy, Marius Portmann

TL;DR

The paper investigates whether large language models (LLMs) can serve as Network Intrusion Detection Systems with a focus on explainability. It compares GPT-4 and LLama3 in zero-shot and fine-tuning settings against traditional transformer-based detectors on NetFlow datasets NF-UNSW-NB15-v2 and NF-CSE-CIC-IDS2018-v2. The findings show LLMs struggle to detect precise malicious NetFlows but hold promise as explainable assistants when integrated with Retrieval-Augmented Generation and function calling for threat response. Given substantial inference-time costs and domain-specific limitations, the study advocates using LLMs as complementary components to lightweight, transformer-based NIDS, while emphasizing the need for grounding explanations and reducing hallucinations in future work.

Abstract

Large Language Models (LLMs) have revolutionised natural language processing tasks, particularly as chat agents. However, their applicability to threat detection problems remains unclear. This paper examines the feasibility of employing LLMs as a Network Intrusion Detection System (NIDS), despite their high computational requirements, primarily for the sake of explainability. Furthermore, considerable resources have been invested in developing LLMs, and they may offer utility for NIDS. Current state-of-the-art NIDS rely on artificial benchmarking datasets, resulting in skewed performance when applied to real-world networking environments. Therefore, we compare the GPT-4 and LLama3 models against traditional architectures and transformer-based models to assess their ability to detect malicious NetFlows without depending on artificially skewed datasets, but solely on their vast pre-trained acquired knowledge. Our results reveal that, although LLMs struggle with precise attack detection, they hold significant potential for a path towards explainable NIDS. Our preliminary exploration shows that LLMs are unfit for the detection of Malicious NetFlows. Most promisingly, however, these exhibit significant potential as complementary agents in NIDS, particularly in providing explanations and aiding in threat response when integrated with Retrieval Augmented Generation (RAG) and function calling capabilities.

Towards Explainable Network Intrusion Detection using Large Language Models

TL;DR

The paper investigates whether large language models (LLMs) can serve as Network Intrusion Detection Systems with a focus on explainability. It compares GPT-4 and LLama3 in zero-shot and fine-tuning settings against traditional transformer-based detectors on NetFlow datasets NF-UNSW-NB15-v2 and NF-CSE-CIC-IDS2018-v2. The findings show LLMs struggle to detect precise malicious NetFlows but hold promise as explainable assistants when integrated with Retrieval-Augmented Generation and function calling for threat response. Given substantial inference-time costs and domain-specific limitations, the study advocates using LLMs as complementary components to lightweight, transformer-based NIDS, while emphasizing the need for grounding explanations and reducing hallucinations in future work.

Abstract

Large Language Models (LLMs) have revolutionised natural language processing tasks, particularly as chat agents. However, their applicability to threat detection problems remains unclear. This paper examines the feasibility of employing LLMs as a Network Intrusion Detection System (NIDS), despite their high computational requirements, primarily for the sake of explainability. Furthermore, considerable resources have been invested in developing LLMs, and they may offer utility for NIDS. Current state-of-the-art NIDS rely on artificial benchmarking datasets, resulting in skewed performance when applied to real-world networking environments. Therefore, we compare the GPT-4 and LLama3 models against traditional architectures and transformer-based models to assess their ability to detect malicious NetFlows without depending on artificially skewed datasets, but solely on their vast pre-trained acquired knowledge. Our results reveal that, although LLMs struggle with precise attack detection, they hold significant potential for a path towards explainable NIDS. Our preliminary exploration shows that LLMs are unfit for the detection of Malicious NetFlows. Most promisingly, however, these exhibit significant potential as complementary agents in NIDS, particularly in providing explanations and aiding in threat response when integrated with Retrieval Augmented Generation (RAG) and function calling capabilities.
Paper Structure (18 sections, 1 figure, 5 tables)