Towards Small Language Models for Security Query Generation in SOC Workflows
Saleha Muzammil, Rahul Reddy, Vishal Kamalakrishnan, Hadi Ahmadi, Wajih Ul Hassan
TL;DR
The paper tackles the bottleneck of translating natural language queries into KQL in Security Operations Centers by evaluating Small Language Models (SLMs) within a three-knob framework encompassing prompting, LoRA fine-tuning with rationale distillation, and architecture design. It introduces an NL2KQL-inspired prompting stack extended with a lightweight Semantic Data Catalog, Schema Refiner, Few-Shot Selector, Prompt Builder, and Query Refiner, and pairs SLMs with LoRA fine-tuning and a two-stage SLM–Oracle architecture to balance cost and accuracy. Across Defender and Sentinel datasets, LLMs achieve high syntax but mixed semantics, while SLMs improve with targeted prompting and rationale distillation; the two-stage architecture delivers near-LLM syntax and semantics at substantially reduced token costs (up to 10× cheaper). These results demonstrate a practical, scalable pathway for enterprise NLQ-to-KQL in SOC workflows, enabling faster investigations while preserving data governance and latency constraints.
Abstract
Analysts in Security Operations Centers routinely query massive telemetry streams using Kusto Query Language (KQL). Writing correct KQL requires specialized expertise, and this dependency creates a bottleneck as security teams scale. This paper investigates whether Small Language Models (SLMs) can enable accurate, cost-effective natural-language-to-KQL translation for enterprise security. We propose a three-knob framework targeting prompting, fine-tuning, and architecture design. First, we adapt existing NL2KQL framework for SLMs with lightweight retrieval and introduce error-aware prompting that addresses common parser failures without increasing token count. Second, we apply LoRA fine-tuning with rationale distillation, augmenting each NLQ-KQL pair with a brief chain-of-thought explanation to transfer reasoning from a teacher model while keeping the SLM compact. Third, we propose a two-stage architecture that uses an SLM for candidate generation and a low-cost LLM judge for schema-aware refinement and selection. We evaluate nine models (five SLMs and four LLMs) across syntax correctness, semantic accuracy, table selection, and filter precision, alongside latency and token cost. On Microsoft's NL2KQL Defender Evaluation dataset, our two-stage approach achieves 0.987 syntax and 0.906 semantic accuracy. We further demonstrate generalizability on Microsoft Sentinel data, reaching 0.964 syntax and 0.831 semantic accuracy. These results come at up to 10x lower token cost than GPT-5, establishing SLMs as a practical, scalable foundation for natural-language querying in security operations.
