Table of Contents
Fetching ...

Towards Small Language Models for Security Query Generation in SOC Workflows

Saleha Muzammil, Rahul Reddy, Vishal Kamalakrishnan, Hadi Ahmadi, Wajih Ul Hassan

TL;DR

The paper tackles the bottleneck of translating natural language queries into KQL in Security Operations Centers by evaluating Small Language Models (SLMs) within a three-knob framework encompassing prompting, LoRA fine-tuning with rationale distillation, and architecture design. It introduces an NL2KQL-inspired prompting stack extended with a lightweight Semantic Data Catalog, Schema Refiner, Few-Shot Selector, Prompt Builder, and Query Refiner, and pairs SLMs with LoRA fine-tuning and a two-stage SLM–Oracle architecture to balance cost and accuracy. Across Defender and Sentinel datasets, LLMs achieve high syntax but mixed semantics, while SLMs improve with targeted prompting and rationale distillation; the two-stage architecture delivers near-LLM syntax and semantics at substantially reduced token costs (up to 10× cheaper). These results demonstrate a practical, scalable pathway for enterprise NLQ-to-KQL in SOC workflows, enabling faster investigations while preserving data governance and latency constraints.

Abstract

Analysts in Security Operations Centers routinely query massive telemetry streams using Kusto Query Language (KQL). Writing correct KQL requires specialized expertise, and this dependency creates a bottleneck as security teams scale. This paper investigates whether Small Language Models (SLMs) can enable accurate, cost-effective natural-language-to-KQL translation for enterprise security. We propose a three-knob framework targeting prompting, fine-tuning, and architecture design. First, we adapt existing NL2KQL framework for SLMs with lightweight retrieval and introduce error-aware prompting that addresses common parser failures without increasing token count. Second, we apply LoRA fine-tuning with rationale distillation, augmenting each NLQ-KQL pair with a brief chain-of-thought explanation to transfer reasoning from a teacher model while keeping the SLM compact. Third, we propose a two-stage architecture that uses an SLM for candidate generation and a low-cost LLM judge for schema-aware refinement and selection. We evaluate nine models (five SLMs and four LLMs) across syntax correctness, semantic accuracy, table selection, and filter precision, alongside latency and token cost. On Microsoft's NL2KQL Defender Evaluation dataset, our two-stage approach achieves 0.987 syntax and 0.906 semantic accuracy. We further demonstrate generalizability on Microsoft Sentinel data, reaching 0.964 syntax and 0.831 semantic accuracy. These results come at up to 10x lower token cost than GPT-5, establishing SLMs as a practical, scalable foundation for natural-language querying in security operations.

Towards Small Language Models for Security Query Generation in SOC Workflows

TL;DR

The paper tackles the bottleneck of translating natural language queries into KQL in Security Operations Centers by evaluating Small Language Models (SLMs) within a three-knob framework encompassing prompting, LoRA fine-tuning with rationale distillation, and architecture design. It introduces an NL2KQL-inspired prompting stack extended with a lightweight Semantic Data Catalog, Schema Refiner, Few-Shot Selector, Prompt Builder, and Query Refiner, and pairs SLMs with LoRA fine-tuning and a two-stage SLM–Oracle architecture to balance cost and accuracy. Across Defender and Sentinel datasets, LLMs achieve high syntax but mixed semantics, while SLMs improve with targeted prompting and rationale distillation; the two-stage architecture delivers near-LLM syntax and semantics at substantially reduced token costs (up to 10× cheaper). These results demonstrate a practical, scalable pathway for enterprise NLQ-to-KQL in SOC workflows, enabling faster investigations while preserving data governance and latency constraints.

Abstract

Analysts in Security Operations Centers routinely query massive telemetry streams using Kusto Query Language (KQL). Writing correct KQL requires specialized expertise, and this dependency creates a bottleneck as security teams scale. This paper investigates whether Small Language Models (SLMs) can enable accurate, cost-effective natural-language-to-KQL translation for enterprise security. We propose a three-knob framework targeting prompting, fine-tuning, and architecture design. First, we adapt existing NL2KQL framework for SLMs with lightweight retrieval and introduce error-aware prompting that addresses common parser failures without increasing token count. Second, we apply LoRA fine-tuning with rationale distillation, augmenting each NLQ-KQL pair with a brief chain-of-thought explanation to transfer reasoning from a teacher model while keeping the SLM compact. Third, we propose a two-stage architecture that uses an SLM for candidate generation and a low-cost LLM judge for schema-aware refinement and selection. We evaluate nine models (five SLMs and four LLMs) across syntax correctness, semantic accuracy, table selection, and filter precision, alongside latency and token cost. On Microsoft's NL2KQL Defender Evaluation dataset, our two-stage approach achieves 0.987 syntax and 0.906 semantic accuracy. We further demonstrate generalizability on Microsoft Sentinel data, reaching 0.964 syntax and 0.831 semantic accuracy. These results come at up to 10x lower token cost than GPT-5, establishing SLMs as a practical, scalable foundation for natural-language querying in security operations.

Paper Structure

This paper contains 27 sections, 2 equations, 7 figures, 12 tables.

Figures (7)

  • Figure 1: Orthogonal Enhancement Knobs: Prompting Scheme Enhancements Fine-Tuning with LoRA, and Multi-SLM Architecture with Oracle Refinement
  • Figure 2: Oracle prompting templates used to guide refinement of model-generated KQL queries. The first oracle uses retrieval and refinement only, while the second incorporates schema context.
  • Figure 3: Two-Stage Architecture with Oracle Refinement. The NLQ is embedded to retrieve the Top 5 relevant tables, which guide the selection of Top 2 few-shot examples. These are processed by DeepSeek Coder 6.7B Instruct, and the outputs are refined by the Oracle model.
  • Figure 4: Zero-Shot prompt used to evaluate how SLMs generate KQL queries
  • Figure 5: Alternative prompting templates used to reduce common KQL errors.
  • ...and 2 more figures