Table of Contents
Fetching ...

On-Premise SLMs vs. Commercial LLMs: Prompt Engineering and Incident Classification in SOCs and CSIRTs

Gefté Almeida, Marcio Pohlmann, Alex Severo, Diego Kreutz, Tiago Heinrich, Lourenço Pereira

TL;DR

This paper addresses the privacy, cost, and data-sovereignty challenges of using commercial LLMs for security-incident classification in SOCs and CSIRTs by evaluating locally hosted open-source small language models (SLMs). It adopts a modular, on-premise pipeline and applies five prompt-engineering techniques (PHP, SHP, HTP, PRP, ZSL) to a real, anonymized incident dataset labeled with the NIST SP 800-61r3 taxonomy, comparing two groups of open-source models across architectures and sizes. The findings show that Progressive-Hint Prompting and Self-Hint Prompting provide the most robust performance across models, with Group 1 large models achieving up to about 61.7% accuracy and Group 2 smaller models around 53%, while HTP remains the least effective. Although the accuracy gap to proprietary LLMs remains, the study demonstrates that on-premise open-source solutions offer clear benefits in privacy, cost predictability, and data sovereignty, supporting initial triage and decision-support in SOCs/CSIRTs. The authors outline future directions including dataset expansion, LoRA fine-tuning, richer evaluation metrics, continuous learning, and user interfaces to improve explainability and traceability, furthering the practicality and auditable nature of SLMs in critical cyber-security workflows.

Abstract

In this study, we evaluate open-source models for security incident classification, comparing them with proprietary models. We utilize a dataset of anonymized real incidents, categorized according to the NIST SP 800-61r3 taxonomy and processed using five prompt-engineering techniques (PHP, SHP, HTP, PRP, and ZSL). The results indicate that, although proprietary models still exhibit higher accuracy, locally deployed open-source models provide advantages in privacy, cost-effectiveness, and data sovereignty.

On-Premise SLMs vs. Commercial LLMs: Prompt Engineering and Incident Classification in SOCs and CSIRTs

TL;DR

This paper addresses the privacy, cost, and data-sovereignty challenges of using commercial LLMs for security-incident classification in SOCs and CSIRTs by evaluating locally hosted open-source small language models (SLMs). It adopts a modular, on-premise pipeline and applies five prompt-engineering techniques (PHP, SHP, HTP, PRP, ZSL) to a real, anonymized incident dataset labeled with the NIST SP 800-61r3 taxonomy, comparing two groups of open-source models across architectures and sizes. The findings show that Progressive-Hint Prompting and Self-Hint Prompting provide the most robust performance across models, with Group 1 large models achieving up to about 61.7% accuracy and Group 2 smaller models around 53%, while HTP remains the least effective. Although the accuracy gap to proprietary LLMs remains, the study demonstrates that on-premise open-source solutions offer clear benefits in privacy, cost predictability, and data sovereignty, supporting initial triage and decision-support in SOCs/CSIRTs. The authors outline future directions including dataset expansion, LoRA fine-tuning, richer evaluation metrics, continuous learning, and user interfaces to improve explainability and traceability, furthering the practicality and auditable nature of SLMs in critical cyber-security workflows.

Abstract

In this study, we evaluate open-source models for security incident classification, comparing them with proprietary models. We utilize a dataset of anonymized real incidents, categorized according to the NIST SP 800-61r3 taxonomy and processed using five prompt-engineering techniques (PHP, SHP, HTP, PRP, and ZSL). The results indicate that, although proprietary models still exhibit higher accuracy, locally deployed open-source models provide advantages in privacy, cost-effectiveness, and data sovereignty.

Paper Structure

This paper contains 6 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Flowchart of the automated classification pipeline using SLMs.
  • Figure 2: Percentage of correct and incorrect predictions by prompt technique.
  • Figure 3: Percentage of correct and incorrect predictions by Model × Prompt Technique.