Table of Contents
Fetching ...

DMFI: A Dual-Modality Log Analysis Framework for Insider Threat Detection with LoRA-Tuned Language Models

Kaichuan Kong, Dongjie Liu, Xiaobo Jin, Guanggang Geng, Zhiying Li, Jian Weng

Abstract

Insider threat detection (ITD) poses a persistent and high-impact challenge in cybersecurity due to the subtle, long-term, and context-dependent nature of malicious insider behaviors. Traditional models often struggle to capture semantic intent and complex behavior dynamics, while existing LLM-based solutions face limitations in prompt adaptability and modality coverage. To bridge this gap, we propose DMFI, a dual-modality framework that integrates semantic inference with behavior-aware fine-tuning. DMFI converts raw logs into two structured views: (1) a semantic view that processes content-rich artifacts (e.g., emails, https) using instruction-formatted prompts; and (2) a behavioral abstraction, constructed via a 4W-guided (When-Where-What-Which) transformation to encode contextual action sequences. Two LoRA-enhanced LLMs are fine-tuned independently, and their outputs are fused via a lightweight MLP-based decision module. We further introduce DMFI-B, a discriminative adaptation strategy that separates normal and abnormal behavior representations, improving robustness under severe class imbalance. Experiments on CERT r4.2 and r5.2 datasets demonstrate that DMFI outperforms state-of-the-art methods in detection accuracy. Our approach combines the semantic reasoning power of LLMs with structured behavior modeling, offering a scalable and effective solution for real-world insider threat detection.

DMFI: A Dual-Modality Log Analysis Framework for Insider Threat Detection with LoRA-Tuned Language Models

Abstract

Insider threat detection (ITD) poses a persistent and high-impact challenge in cybersecurity due to the subtle, long-term, and context-dependent nature of malicious insider behaviors. Traditional models often struggle to capture semantic intent and complex behavior dynamics, while existing LLM-based solutions face limitations in prompt adaptability and modality coverage. To bridge this gap, we propose DMFI, a dual-modality framework that integrates semantic inference with behavior-aware fine-tuning. DMFI converts raw logs into two structured views: (1) a semantic view that processes content-rich artifacts (e.g., emails, https) using instruction-formatted prompts; and (2) a behavioral abstraction, constructed via a 4W-guided (When-Where-What-Which) transformation to encode contextual action sequences. Two LoRA-enhanced LLMs are fine-tuned independently, and their outputs are fused via a lightweight MLP-based decision module. We further introduce DMFI-B, a discriminative adaptation strategy that separates normal and abnormal behavior representations, improving robustness under severe class imbalance. Experiments on CERT r4.2 and r5.2 datasets demonstrate that DMFI outperforms state-of-the-art methods in detection accuracy. Our approach combines the semantic reasoning power of LLMs with structured behavior modeling, offering a scalable and effective solution for real-world insider threat detection.

Paper Structure

This paper contains 39 sections, 16 equations, 4 figures, 5 tables, 2 algorithms.

Figures (4)

  • Figure 1: Overview of DMFI. It consists of three main components: (1) Preprocessing, which extracts semantic and behavioral views from user logs; (2) Fine-tuning with LoRA, where dual instruction-tuned LLMs are trained on semantic and behavioral prompts; and (3) Inference and Decision Integration, where multi-feature scores are fused via a lightweight MLP to derive a final anomaly score $\alpha_{\text{joint}}$ for insider threat detection.
  • Figure 2: Illustrative example of behavior sequence compression. The left table organizes raw user actions using the 4W schema (When, Where, What, Which). On the right, we compare the original verbose sequence with a compressed version generated by our 4W-guided abstraction strategy. While the original form enumerates each atomic behavior separately, the compressed version merges related actions into concise natural language. This reduces token length (from 49 to 31 in this case) while retaining key behavioral semantics.
  • Figure 3: Effect of LLM backbones on detection performance over CERT r4.2 and r5.2 datasets.
  • Figure 4: Comparison of fine-tuning strategies (DMFI-A vs. DMFI-B) across performance and efficiency dimensions.