Table of Contents
Fetching ...

An Explainable Memory Forensics Approach for Malware Analysis

Silvia Lucia Sanna, Davide Maiorca, Giorgio Giacinto

TL;DR

This work tackles the interpretability gap in memory-forensics-based malware analysis by introducing an explainable AI-assisted pipeline that leverages general-purpose large language models (LLMs) to interpret volatile memory outputs and automatically extract Indicators of Compromise (IoCs). It applies the approach to Windows and Android, comparing full RAM extraction with target-process memory dumps to reveal complementary forensic insights and enhance detection beyond traditional file-based tools such as VirusTotal. The methodology combines kernel-assisted Android memory acquisition, Volatility-based analysis, and LLM-driven explanation and IoC extraction, offering human-readable reports and justifications for classifications. A human-in-the-loop workflow during Android setup improves reproducibility and reduces operational complexity, with kernel artifacts and build metadata released to support independent validation. Overall, the study demonstrates that AI-driven memory forensics can provide richer, more actionable insights for modern malware investigations while remaining complementary to existing defense tools.

Abstract

Memory forensics is an effective methodology for analyzing living-off-the-land malware, including threats that employ evasion, obfuscation, anti-analysis, and steganographic techniques. By capturing volatile system state, memory analysis enables the recovery of transient artifacts such as decrypted payloads, executed commands, credentials, and cryptographic keys that are often inaccessible through static or traditional dynamic analysis. While several automated models have been proposed for malware detection from memory, their outputs typically lack interpretability, and memory analysis still relies heavily on expert-driven inspection of complex tool outputs, such as those produced by Volatility. In this paper, we propose an explainable, AI-assisted memory forensics approach that leverages general-purpose large language models (LLMs) to interpret memory analysis outputs in a human-readable form and to automatically extract meaningful Indicators of Compromise (IoCs), in some circumstances detecting more IoCs than current state-of-the-art tools. We apply the proposed methodology to both Windows and Android malware, comparing full RAM acquisition with target-process memory dumping and highlighting their complementary forensic value. Furthermore, we demonstrate how LLMs can support both expert and non-expert analysts by explaining analysis results, correlating artifacts, and justifying malware classifications. Finally, we show that a human-in-the-loop workflow, assisted by LLMs during kernel-assisted setup and analysis, improves reproducibility and reduces operational complexity, thereby reinforcing the practical applicability of AI-driven memory forensics for modern malware investigations.

An Explainable Memory Forensics Approach for Malware Analysis

TL;DR

This work tackles the interpretability gap in memory-forensics-based malware analysis by introducing an explainable AI-assisted pipeline that leverages general-purpose large language models (LLMs) to interpret volatile memory outputs and automatically extract Indicators of Compromise (IoCs). It applies the approach to Windows and Android, comparing full RAM extraction with target-process memory dumps to reveal complementary forensic insights and enhance detection beyond traditional file-based tools such as VirusTotal. The methodology combines kernel-assisted Android memory acquisition, Volatility-based analysis, and LLM-driven explanation and IoC extraction, offering human-readable reports and justifications for classifications. A human-in-the-loop workflow during Android setup improves reproducibility and reduces operational complexity, with kernel artifacts and build metadata released to support independent validation. Overall, the study demonstrates that AI-driven memory forensics can provide richer, more actionable insights for modern malware investigations while remaining complementary to existing defense tools.

Abstract

Memory forensics is an effective methodology for analyzing living-off-the-land malware, including threats that employ evasion, obfuscation, anti-analysis, and steganographic techniques. By capturing volatile system state, memory analysis enables the recovery of transient artifacts such as decrypted payloads, executed commands, credentials, and cryptographic keys that are often inaccessible through static or traditional dynamic analysis. While several automated models have been proposed for malware detection from memory, their outputs typically lack interpretability, and memory analysis still relies heavily on expert-driven inspection of complex tool outputs, such as those produced by Volatility. In this paper, we propose an explainable, AI-assisted memory forensics approach that leverages general-purpose large language models (LLMs) to interpret memory analysis outputs in a human-readable form and to automatically extract meaningful Indicators of Compromise (IoCs), in some circumstances detecting more IoCs than current state-of-the-art tools. We apply the proposed methodology to both Windows and Android malware, comparing full RAM acquisition with target-process memory dumping and highlighting their complementary forensic value. Furthermore, we demonstrate how LLMs can support both expert and non-expert analysts by explaining analysis results, correlating artifacts, and justifying malware classifications. Finally, we show that a human-in-the-loop workflow, assisted by LLMs during kernel-assisted setup and analysis, improves reproducibility and reduces operational complexity, thereby reinforcing the practical applicability of AI-driven memory forensics for modern malware investigations.
Paper Structure (12 sections, 5 figures, 5 tables)

This paper contains 12 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Execution pipeline. Run the target program (APK or PE) in the correct sandbox, extract the RAM related to the target process and the full RAM, analyse the dump and extract IoCs with LLMs and NLP algorithms, producing a human-readable report explaining the content of the dump, listing the found IoCs and explaining the classified executed program. The extracted results are compared with popular state-of-the-art tools
  • Figure 2: Evaluation results on Windows
  • Figure 3: Comparison results on Windows
  • Figure 4: Comparison of the Android evaluation accuracy of the three models (from left to right: custom NLP, chatGPT-4o-mini, gemini-2.0-flash-lite) with the four different analysis (from left to right: volatility2, volatility3, strings extracted from the complete dump with LiME, and with the target dump with Frida
  • Figure 5: Average percentage of misclassified samples for the different Android detectors, i.e. Drebin, Entroplyzer, VirusTotal (notably average percentage of AV engines that fail to detect on VT) and the LLM-RAM approach with the different LLMs and RAM analysis methodologies (volatility2, volatility3, strings from the complete dump or the target process dump)