Identifying Adversary Tactics and Techniques in Malware Binaries with an LLM Agent
Zhou Xuan, Xiangzhe Xu, Mingwei Zheng, Louis Zheng-Hua Tan, Jinyao Guo, Tiantai Zhang, Le Yu, Chengpeng Wang, Xiangyu Zhang
TL;DR
This work introduces TTPDetect, the first LLM-based agent designed to identify MITRE ATT&CK TTPs from stripped malware binaries at both the function and binary levels. It combines a hybrid retrieval pipeline to efficiently narrow candidate function–TTP pairs with a function-level analyzing agent that uses a Context Explorer for on-demand context and TTP-specific reasoning guidelines for inference-time alignment. A new dataset with function-level TTP annotations across diverse malware families supports rigorous evaluation, where TTPDetect achieves high precision and recall at the function level and demonstrates strong performance on real-world binaries, including discovery of previously unreported TTPs. The approach advances structured adversarial behavior modeling directly from binary code, offering scalable, interpretable insights for threat intelligence and defense planning.
Abstract
Understanding TTPs (Tactics, Techniques, and Procedures) in malware binaries is essential for security analysis and threat intelligence, yet remains challenging in practice. Real-world malware binaries are typically stripped of symbols, contain large numbers of functions, and distribute malicious behavior across multiple code regions, making TTP attribution difficult. Recent large language models (LLMs) offer strong code understanding capabilities, but applying them directly to this task faces challenges in identifying analysis entry points, reasoning under partial observability, and misalignment with TTP-specific decision logic. We present TTPDetect, the first LLM agent for recognizing TTPs in stripped malware binaries. TTPDetect combines dense retrieval with LLM-based neural retrieval to narrow the space of analysis entry points. TTPDetect further employs a function-level analyzing agent consisting of a Context Explorer that performs on-demand, incremental context retrieval and a TTP-Specific Reasoning Guideline that achieves inference-time alignment. We build a new dataset that labels decompiled functions with TTPs across diverse malware families and platforms. TTPDetect achieves 93.25% precision and 93.81% recall on function-level TTP recognition, outperforming baselines by 10.38% and 18.78%, respectively. When evaluated on real world malware samples, TTPDetect recognizes TTPs with a precision of 87.37%. For malware with expert-written reports, TTPDetect recovers 85.7% of the documented TTPs and further discovers, on average, 10.5 previously unreported TTPs per malware.
