Table of Contents
Fetching ...

Memory-Augmented Log Analysis with Phi-4-mini: Enhancing Threat Detection in Structured Security Logs

Anbi Guo, Mahfuza Farooque

TL;DR

The paper tackles log anomaly detection for multistage APTs in structured security logs, where LLMs suffer from context limits and domain mismatch. It introduces DM-RAG, a dual-memory retrieval-augmented generation framework that combines a rolling short-term memory with a FAISS-indexed long-term memory and uses an instruction-tuned Phi-4-mini with Bayesian fusion. On UNSW-NB15, DM-RAG achieves $98.70\%$ recall and $69.59\%$ F1, surpassing LoRA-fine-tuned and MITRE-style RAG baselines in recall while maintaining competitive precision. The method is lightweight, interpretable, and suitable for real-time threat monitoring without external corpora, with potential applicability to other structured temporal data.

Abstract

Structured security logs are critical for detecting advanced persistent threats (APTs). Large language models (LLMs) struggle in this domain due to limited context and domain mismatch. We propose \textbf{DM-RAG}, a dual-memory retrieval-augmented generation framework for structured log analysis. It integrates a short-term memory buffer for recent summaries and a long-term FAISS-indexed memory for historical patterns. An instruction-tuned Phi-4-mini processes the combined context and outputs structured predictions. Bayesian fusion promotes reliable persistence into memory. On the UNSW-NB15 dataset, DM-RAG achieves 53.64% accuracy and 98.70% recall, surpassing fine-tuned and RAG baselines in recall. The architecture is lightweight, interpretable, and scalable, enabling real-time threat monitoring without extra corpora or heavy tuning.

Memory-Augmented Log Analysis with Phi-4-mini: Enhancing Threat Detection in Structured Security Logs

TL;DR

The paper tackles log anomaly detection for multistage APTs in structured security logs, where LLMs suffer from context limits and domain mismatch. It introduces DM-RAG, a dual-memory retrieval-augmented generation framework that combines a rolling short-term memory with a FAISS-indexed long-term memory and uses an instruction-tuned Phi-4-mini with Bayesian fusion. On UNSW-NB15, DM-RAG achieves recall and F1, surpassing LoRA-fine-tuned and MITRE-style RAG baselines in recall while maintaining competitive precision. The method is lightweight, interpretable, and suitable for real-time threat monitoring without external corpora, with potential applicability to other structured temporal data.

Abstract

Structured security logs are critical for detecting advanced persistent threats (APTs). Large language models (LLMs) struggle in this domain due to limited context and domain mismatch. We propose \textbf{DM-RAG}, a dual-memory retrieval-augmented generation framework for structured log analysis. It integrates a short-term memory buffer for recent summaries and a long-term FAISS-indexed memory for historical patterns. An instruction-tuned Phi-4-mini processes the combined context and outputs structured predictions. Bayesian fusion promotes reliable persistence into memory. On the UNSW-NB15 dataset, DM-RAG achieves 53.64% accuracy and 98.70% recall, surpassing fine-tuned and RAG baselines in recall. The architecture is lightweight, interpretable, and scalable, enabling real-time threat monitoring without extra corpora or heavy tuning.

Paper Structure

This paper contains 30 sections, 7 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: Incoming network logs are analyzed by an instruction-tuned LLM with dual memory. The short-term memory stores recent summaries and scores, while the long-term memory retrieves relevant high-confidence examples via FAISS. These jointly inform the prompt to Phi-4-mini for threat reasoning and classification. STM's confidence score are periodically compressed with Bayesian fusion, and high-confidence results are promoted to LTM for future retrieval.
  • Figure 2: Prompt template sent to the language model, composed of four parts: current log, STM summaries, LTM retrievals, and task requirements.
  • Figure 3: Instruction block provided to the language model to define the anomaly detection and classification task.
  • Figure 4: Instruction enforcing strict JSON output format from the LLM.