Table of Contents
Fetching ...

Identifying Adversary Tactics and Techniques in Malware Binaries with an LLM Agent

Zhou Xuan, Xiangzhe Xu, Mingwei Zheng, Louis Zheng-Hua Tan, Jinyao Guo, Tiantai Zhang, Le Yu, Chengpeng Wang, Xiangyu Zhang

TL;DR

This work introduces TTPDetect, the first LLM-based agent designed to identify MITRE ATT&CK TTPs from stripped malware binaries at both the function and binary levels. It combines a hybrid retrieval pipeline to efficiently narrow candidate function–TTP pairs with a function-level analyzing agent that uses a Context Explorer for on-demand context and TTP-specific reasoning guidelines for inference-time alignment. A new dataset with function-level TTP annotations across diverse malware families supports rigorous evaluation, where TTPDetect achieves high precision and recall at the function level and demonstrates strong performance on real-world binaries, including discovery of previously unreported TTPs. The approach advances structured adversarial behavior modeling directly from binary code, offering scalable, interpretable insights for threat intelligence and defense planning.

Abstract

Understanding TTPs (Tactics, Techniques, and Procedures) in malware binaries is essential for security analysis and threat intelligence, yet remains challenging in practice. Real-world malware binaries are typically stripped of symbols, contain large numbers of functions, and distribute malicious behavior across multiple code regions, making TTP attribution difficult. Recent large language models (LLMs) offer strong code understanding capabilities, but applying them directly to this task faces challenges in identifying analysis entry points, reasoning under partial observability, and misalignment with TTP-specific decision logic. We present TTPDetect, the first LLM agent for recognizing TTPs in stripped malware binaries. TTPDetect combines dense retrieval with LLM-based neural retrieval to narrow the space of analysis entry points. TTPDetect further employs a function-level analyzing agent consisting of a Context Explorer that performs on-demand, incremental context retrieval and a TTP-Specific Reasoning Guideline that achieves inference-time alignment. We build a new dataset that labels decompiled functions with TTPs across diverse malware families and platforms. TTPDetect achieves 93.25% precision and 93.81% recall on function-level TTP recognition, outperforming baselines by 10.38% and 18.78%, respectively. When evaluated on real world malware samples, TTPDetect recognizes TTPs with a precision of 87.37%. For malware with expert-written reports, TTPDetect recovers 85.7% of the documented TTPs and further discovers, on average, 10.5 previously unreported TTPs per malware.

Identifying Adversary Tactics and Techniques in Malware Binaries with an LLM Agent

TL;DR

This work introduces TTPDetect, the first LLM-based agent designed to identify MITRE ATT&CK TTPs from stripped malware binaries at both the function and binary levels. It combines a hybrid retrieval pipeline to efficiently narrow candidate function–TTP pairs with a function-level analyzing agent that uses a Context Explorer for on-demand context and TTP-specific reasoning guidelines for inference-time alignment. A new dataset with function-level TTP annotations across diverse malware families supports rigorous evaluation, where TTPDetect achieves high precision and recall at the function level and demonstrates strong performance on real-world binaries, including discovery of previously unreported TTPs. The approach advances structured adversarial behavior modeling directly from binary code, offering scalable, interpretable insights for threat intelligence and defense planning.

Abstract

Understanding TTPs (Tactics, Techniques, and Procedures) in malware binaries is essential for security analysis and threat intelligence, yet remains challenging in practice. Real-world malware binaries are typically stripped of symbols, contain large numbers of functions, and distribute malicious behavior across multiple code regions, making TTP attribution difficult. Recent large language models (LLMs) offer strong code understanding capabilities, but applying them directly to this task faces challenges in identifying analysis entry points, reasoning under partial observability, and misalignment with TTP-specific decision logic. We present TTPDetect, the first LLM agent for recognizing TTPs in stripped malware binaries. TTPDetect combines dense retrieval with LLM-based neural retrieval to narrow the space of analysis entry points. TTPDetect further employs a function-level analyzing agent consisting of a Context Explorer that performs on-demand, incremental context retrieval and a TTP-Specific Reasoning Guideline that achieves inference-time alignment. We build a new dataset that labels decompiled functions with TTPs across diverse malware families and platforms. TTPDetect achieves 93.25% precision and 93.81% recall on function-level TTP recognition, outperforming baselines by 10.38% and 18.78%, respectively. When evaluated on real world malware samples, TTPDetect recognizes TTPs with a precision of 87.37%. For malware with expert-written reports, TTPDetect recovers 85.7% of the documented TTPs and further discovers, on average, 10.5 previously unreported TTPs per malware.
Paper Structure (25 sections, 2 equations, 9 figures, 4 tables)

This paper contains 25 sections, 2 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: An example of LLM Free-form TTP Attribution (Claude 3.7 Sonnet). The input is a renamed decompiled function with summary. Q denotes the input prompt and A denotes the LLM’s generated response. This function exhibits T1562 (Impair Defenses). The LLM correctly recognizes T1562, however it produces 4 false positives.
  • Figure 2: Comparison of the Basic Prompt and the Analyzing Agent for TTP presence prediction
  • Figure 3: Hybrid retrieval. Given the TTP definition of T1041, the dense retriever retrieves multiple candidate functions, ranking udp_flood_attack as a top candidate. In neural retrieval, the LLM is prompted with the decompiled function to generate an over-inclusive set of potentially relevant TTPs. T1041 is pruned.
  • Figure 4: TTPDetect Overview.
  • Figure 5: LLM prompts used for renaming.
  • ...and 4 more figures