RulePilot: An LLM-Powered Agent for Security Rule Generation
Hongtai Wang, Ming Xu, Yanpei Guo, Weili Han, Hoon Wei Lim, Jin Song Dong
TL;DR
RulePilot introduces an LLM-based agent for automated SIEM rule creation and cross-platform rule conversion, bridging the gap between natural-language requirements and SIEM-specific rule grammars. It integrates a novel intermediate representation (IR), Chain-of-Thought prompting, and a reflection-based iterative optimization workflow, enabling Splunk SPL rule generation from annotations and subsequent conversion to Microsoft KQL. The approach achieves substantial gains in textual fidelity (up to 107.4% over baselines) and solid execution accuracy in real-world Splunk tests, with a case study showing time savings for junior analysts. This work advances practical, end-to-end automation for security rule management and provides open datasets and code to encourage adoption and further research.
Abstract
The real-time demand for system security leads to the detection rules becoming an integral part of the intrusion detection life-cycle. Rule-based detection often identifies malicious logs based on the predefined grammar logic, requiring experts with deep domain knowledge for rule generation. Therefore, automation of rule generation can result in significant time savings and ease the burden of rule-related tasks on security engineers. In this paper, we propose RulePilot, which mimics human expertise via LLM-based agent for addressing rule-related challenges like rule creation or conversion. Using RulePilot, the security analysts do not need to write down the rules following the grammar, instead, they can just provide the annotations such as the natural-language-based descriptions of a rule, our RulePilot can automatically generate the detection rules without more intervention. RulePilot is equipped with the intermediate representation (IR), which abstracts the complexity of config rules into structured, standardized formats, allowing LLMs to focus on generation rules in a more manageable and consistent way. We present a comprehensive evaluation of RulePilot in terms of textual similarity and execution success abilities, showcasing RulePilot can generate high-fidelity rules, outperforming the baseline models by up to 107.4% in textual similarity to ground truths and achieving better detection accuracy in real-world execution tests. We perform a case study from our industry collaborators in Singapore, showcasing that RulePilot significantly help junior analysts/general users in the rule creation process.
