MaLAware: Automating the Comprehension of Malicious Software Behaviours using Large Language Models (LLMs)

Bikash Saha; Nanda Rani; Sandeep Kumar Shukla

MaLAware: Automating the Comprehension of Malicious Software Behaviours using Large Language Models (LLMs)

Bikash Saha, Nanda Rani, Sandeep Kumar Shukla

TL;DR

MaLAware presents an LLM-assisted framework to automatically translate Cuckoo Sandbox malware reports into concise, human-readable narratives of malicious behavior, bridging the gap between detection and actionable insights. The approach filters raw sandbox data, leverages open-source LLMs for behavior correlation and explanation, and applies post-processing to produce structured summaries. Evaluated on 133 ground-truth summaries with 11 metrics across five models, Qwen2.5-7B-Instruct generally achieves the best lexical, semantic, and fluency performance, while Mistral-7B-Instruct-v0.3 excels in readability and diversity. The work demonstrates the feasibility and value of LLM-based malware narrative generation for rapid incident response and broader stakeholder comprehension, and outlines concrete directions for efficiency, fine-tuning, and expanded datasets.

Abstract

Current malware (malicious software) analysis tools focus on detection and family classification but fail to provide clear and actionable narrative insights into the malignant activity of the malware. Therefore, there is a need for a tool that translates raw malware data into human-readable descriptions. Developing such a tool accelerates incident response, reduces malware analysts' cognitive load, and enables individuals having limited technical expertise to understand malicious software behaviour. With this objective, we present MaLAware, which automatically summarizes the full spectrum of malicious activity of malware executables. MaLAware processes Cuckoo Sandbox-generated reports using large language models (LLMs) to correlate malignant activities and generate concise summaries explaining malware behaviour. We evaluate the tool's performance on five open-source LLMs. The evaluation uses the human-written malware behaviour description dataset as ground truth. The model's performance is measured using 11 extensive performance metrics, which boosts the confidence of MaLAware's effectiveness. The current version of the tool, i.e., MaLAware, supports Qwen2.5-7B, Llama2-7B, Llama3.1-8B, Mistral-7B, and Falcon-7B, along with the quantization feature for resource-constrained environments. MaLAware lays a foundation for future research in malware behavior explanation, and its extensive evaluation demonstrates LLMs' ability to narrate malware behavior in an actionable and comprehensive manner.

MaLAware: Automating the Comprehension of Malicious Software Behaviours using Large Language Models (LLMs)

TL;DR

Abstract

MaLAware: Automating the Comprehension of Malicious Software Behaviours using Large Language Models (LLMs)

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)