Tactics, Techniques, and Procedures (TTPs) in Interpreted Malware: A Zero-Shot Generation with Large Language Models
Ying Zhang, Xiaoyan Zhou, Hui Wen, Wenjia Niu, Jiqiang Liu, Haining Wang, Qiang Li
TL;DR
This work tackles the security risks of interpreted OSS malware in software supply chains by introducing GenTTP, a zero-shot LLM-based framework that automatically generates TTPs—deceptive and execution—for OSS packages. It builds two datasets (a large-scale in-the-wild collection of 5,890 packages and a ground-truth set from 1,366 analysis reports) and a ChatTTP chatbot leveraging Retrieval-Augmented Generation to enable practical malware analysis at scale. Empirical results show GenTTP achieves high accuracy (CR ≈ $0.90$, SA ≈ $0.99$ on ground-truth) and competitive performance against heuristic tools, while GPT-4 generally provides the strongest results. The paper reveals that many OSS malware share stable TTPs, and that TTPs closely reflect attacker intent, offering a scalable, behavior-centric lens for defending OSS ecosystems and guiding future research across more registries and languages.
Abstract
Nowadays, the open-source software (OSS) ecosystem suffers from security threats of software supply chain (SSC) attacks. Interpreted OSS malware plays a vital role in SSC attacks, as criminals have an arsenal of attack vectors to deceive users into installing malware and executing malicious activities. In this paper, we introduce tactics, techniques, and procedures (TTPs) proposed by MITRE ATT\&CK into the interpreted malware analysis to characterize different phases of an attack lifecycle. Specifically, we propose GENTTP, a zero-shot approach to extracting a TTP of an interpreted malware package. GENTTP leverages large language models (LLMs) to automatically generate a TTP, where the input is a malicious package, and the output is a deceptive tactic and an execution tactic of attack vectors. To validate the effectiveness of GENTTP, we collect two datasets for evaluation: a dataset with ground truth labels and a large dataset in the wild. Experimental results show that GENTTP can generate TTPs with high accuracy and efficiency. To demonstrate GENTTP's benefits, we build an LLM-based Chatbot from 3,700+ PyPI malware's TTPs. We further conduct a quantitative analysis of malware's TTPs at a large scale. Our main findings include: (1) many OSS malicious packages share a relatively stable TTP, even with the increasing emergence of malware and attack campaigns, (2) a TTP reflects characteristics of a malware-based attack, and (3) an attacker's intent behind the malware is linked to a TTP.
