Table of Contents
Fetching ...

Audit-LLM: Multi-Agent Collaboration for Log-based Insider Threat Detection

Chengyu Song, Linru Ma, Jianming Zheng, Jinzhi Liao, Hongyu Kuang, Lin Yang

TL;DR

Audit-LLM introduces a multi-agent framework for log-based insider threat detection that decomposes ITD tasks with a Chain-of-Thought approach, builds reusable tools, and executes sub-tasks via two Executors. A pair-wise Evidence-Based Multi-Agent Debate (EMAD) mitigates faithfulness hallucinations by iteratively refining conclusions through agent dialogue. Empirical results on CERT r4.2, CERT r5.2, and PicoDomain show Audit-LLM surpasses baselines in accuracy and reduces false positives, with interpretable, human-readable audit explanations. The work demonstrates practical potential for robust, transparent ITD in real-world systems and points to future enhancements like MITRE ATT&CK integration and retrieval-augmented generation for mitigation guidance.

Abstract

Log-based insider threat detection (ITD) detects malicious user activities by auditing log entries. Recently, large language models (LLMs) with strong common sense knowledge have emerged in the domain of ITD. Nevertheless, diverse activity types and overlong log files pose a significant challenge for LLMs in directly discerning malicious ones within myriads of normal activities. Furthermore, the faithfulness hallucination issue from LLMs aggravates its application difficulty in ITD, as the generated conclusion may not align with user commands and activity context. In response to these challenges, we introduce Audit-LLM, a multi-agent log-based insider threat detection framework comprising three collaborative agents: (i) the Decomposer agent, breaking down the complex ITD task into manageable sub-tasks using Chain-of-Thought (COT) reasoning;(ii) the Tool Builder agent, creating reusable tools for sub-tasks to overcome context length limitations in LLMs; and (iii) the Executor agent, generating the final detection conclusion by invoking constructed tools. To enhance conclusion accuracy, we propose a pair-wise Evidence-based Multi-agent Debate (EMAD) mechanism, where two independent Executors iteratively refine their conclusions through reasoning exchange to reach a consensus. Comprehensive experiments conducted on three publicly available ITD datasets-CERT r4.2, CERT r5.2, and PicoDomain-demonstrate the superiority of our method over existing baselines and show that the proposed EMAD significantly improves the faithfulness of explanations generated by LLMs.

Audit-LLM: Multi-Agent Collaboration for Log-based Insider Threat Detection

TL;DR

Audit-LLM introduces a multi-agent framework for log-based insider threat detection that decomposes ITD tasks with a Chain-of-Thought approach, builds reusable tools, and executes sub-tasks via two Executors. A pair-wise Evidence-Based Multi-Agent Debate (EMAD) mitigates faithfulness hallucinations by iteratively refining conclusions through agent dialogue. Empirical results on CERT r4.2, CERT r5.2, and PicoDomain show Audit-LLM surpasses baselines in accuracy and reduces false positives, with interpretable, human-readable audit explanations. The work demonstrates practical potential for robust, transparent ITD in real-world systems and points to future enhancements like MITRE ATT&CK integration and retrieval-augmented generation for mitigation guidance.

Abstract

Log-based insider threat detection (ITD) detects malicious user activities by auditing log entries. Recently, large language models (LLMs) with strong common sense knowledge have emerged in the domain of ITD. Nevertheless, diverse activity types and overlong log files pose a significant challenge for LLMs in directly discerning malicious ones within myriads of normal activities. Furthermore, the faithfulness hallucination issue from LLMs aggravates its application difficulty in ITD, as the generated conclusion may not align with user commands and activity context. In response to these challenges, we introduce Audit-LLM, a multi-agent log-based insider threat detection framework comprising three collaborative agents: (i) the Decomposer agent, breaking down the complex ITD task into manageable sub-tasks using Chain-of-Thought (COT) reasoning;(ii) the Tool Builder agent, creating reusable tools for sub-tasks to overcome context length limitations in LLMs; and (iii) the Executor agent, generating the final detection conclusion by invoking constructed tools. To enhance conclusion accuracy, we propose a pair-wise Evidence-based Multi-agent Debate (EMAD) mechanism, where two independent Executors iteratively refine their conclusions through reasoning exchange to reach a consensus. Comprehensive experiments conducted on three publicly available ITD datasets-CERT r4.2, CERT r5.2, and PicoDomain-demonstrate the superiority of our method over existing baselines and show that the proposed EMAD significantly improves the faithfulness of explanations generated by LLMs.
Paper Structure (29 sections, 5 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 29 sections, 5 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: An example of the three agents included in Audit-LLM, along with their interaction and workflow.
  • Figure 2: The framework of Audit-LLM comprises three agents:(i) the Decomposer, tasked with breaking down complex tasks into more manageable sub-tasks via the COT reasoning, (ii) the Tool builder, responsible for creating a suite of task-specific, callable tools; and (iii) two Executors, dedicated to independently accomplishing the sub-tasks and reach the conclusion consensus by the pair-wise Evidence-based Multi-agent Debate mechanism.
  • Figure 3: An example of building different agents via prompts. The specific details are omitted for the sake of brevity.
  • Figure 4: Performance of Audit-LLM with different base LLMs.
  • Figure 5: The average latency, token usage, and economic costs of two online LLM APIs across four scenarios.