Automatically Generating Rules of Malicious Software Packages via Large Language Model

XiangRui Zhang; HaoYu Chen; Yongzhong He; Wenjia Niu; Qiang Li

Automatically Generating Rules of Malicious Software Packages via Large Language Model

XiangRui Zhang, HaoYu Chen, Yongzhong He, Wenjia Niu, Qiang Li

TL;DR

The paper tackles the challenge of evolving OSS supply chain threats by automating the creation of security rules with RuleLLM, an LLM-driven pipeline that decomposes rule generation into crafting, refining, and aligning sub-tasks to produce deployable YARA and Semgrep rules. It leverages metadata and code extraction (via egg-info/setup metadata and CodeBERT-based code embeddings with K-Means clustering) to feed the LLM, and uses an agent-driven alignment loop with custom rule compilers to minimize hallucinations and ensure syntactic correctness. Empirical evaluation on 1,633 malicious OSS packages (plus 500 legitimate ones) reports 763 rules with 85.2% precision and 91.8% recall, outperforming SOTA tools and score-based approaches, and reveals a taxonomy of 11 categories and 38 subcategories. The work demonstrates the viability and impact of LLM-assisted, scalable rule generation for improving OSS security tooling, and provides the RuleLLM tool and 763 rules to the research community for broader adoption.

Abstract

Today's security tools predominantly rely on predefined rules crafted by experts, making them poorly adapted to the emergence of software supply chain attacks. To tackle this limitation, we propose a novel tool, RuleLLM, which leverages large language models (LLMs) to automate rule generation for OSS ecosystems. RuleLLM extracts metadata and code snippets from malware as its input, producing YARA and Semgrep rules that can be directly deployed in software development. Specifically, the rule generation task involves three subtasks: crafting rules, refining rules, and aligning rules. To validate RuleLLM's effectiveness, we implemented a prototype system and conducted experiments on the dataset of 1,633 malicious packages. The results are promising that RuleLLM generated 763 rules (452 YARA and 311 Semgrep) with a precision of 85.2\% and a recall of 91.8\%, outperforming state-of-the-art (SOTA) tools and scored-based approaches. We further analyzed generated rules and proposed a rule taxonomy: 11 categories and 38 subcategories.

Automatically Generating Rules of Malicious Software Packages via Large Language Model

TL;DR

Abstract

Automatically Generating Rules of Malicious Software Packages via Large Language Model

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)