On-Premise SLMs vs. Commercial LLMs: Prompt Engineering and Incident Classification in SOCs and CSIRTs

Gefté Almeida; Marcio Pohlmann; Alex Severo; Diego Kreutz; Tiago Heinrich; Lourenço Pereira

On-Premise SLMs vs. Commercial LLMs: Prompt Engineering and Incident Classification in SOCs and CSIRTs

Gefté Almeida, Marcio Pohlmann, Alex Severo, Diego Kreutz, Tiago Heinrich, Lourenço Pereira

TL;DR

This paper addresses the privacy, cost, and data-sovereignty challenges of using commercial LLMs for security-incident classification in SOCs and CSIRTs by evaluating locally hosted open-source small language models (SLMs). It adopts a modular, on-premise pipeline and applies five prompt-engineering techniques (PHP, SHP, HTP, PRP, ZSL) to a real, anonymized incident dataset labeled with the NIST SP 800-61r3 taxonomy, comparing two groups of open-source models across architectures and sizes. The findings show that Progressive-Hint Prompting and Self-Hint Prompting provide the most robust performance across models, with Group 1 large models achieving up to about 61.7% accuracy and Group 2 smaller models around 53%, while HTP remains the least effective. Although the accuracy gap to proprietary LLMs remains, the study demonstrates that on-premise open-source solutions offer clear benefits in privacy, cost predictability, and data sovereignty, supporting initial triage and decision-support in SOCs/CSIRTs. The authors outline future directions including dataset expansion, LoRA fine-tuning, richer evaluation metrics, continuous learning, and user interfaces to improve explainability and traceability, furthering the practicality and auditable nature of SLMs in critical cyber-security workflows.

Abstract

In this study, we evaluate open-source models for security incident classification, comparing them with proprietary models. We utilize a dataset of anonymized real incidents, categorized according to the NIST SP 800-61r3 taxonomy and processed using five prompt-engineering techniques (PHP, SHP, HTP, PRP, and ZSL). The results indicate that, although proprietary models still exhibit higher accuracy, locally deployed open-source models provide advantages in privacy, cost-effectiveness, and data sovereignty.

On-Premise SLMs vs. Commercial LLMs: Prompt Engineering and Incident Classification in SOCs and CSIRTs

TL;DR

Abstract

On-Premise SLMs vs. Commercial LLMs: Prompt Engineering and Incident Classification in SOCs and CSIRTs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)