Agentic Knowledge Distillation: Autonomous Training of Small Language Models for SMS Threat Detection

Adel ElZemity; Joshua Sylvester; Budi Arief; Rogério De Lemos

Agentic Knowledge Distillation: Autonomous Training of Small Language Models for SMS Threat Detection

Adel ElZemity, Joshua Sylvester, Budi Arief, Rogério De Lemos

TL;DR

This work tackles rapid, privacy-preserving on-device SMS threat detection by introducing Agentic Knowledge Distillation, where a capable LLM acts as an autonomous teacher to generate synthetic data and iteratively fine-tune a small on-device model via LoRA. The approach is evaluated across four teacher LLMs and two student SLMs, with a strict separation between synthetic validation and human-labelled test data. Results show substantial performance gains over a static Direct Preference Optimisation baseline, with the best configuration (Claude Opus 4.5 as teacher and Qwen2.5-0.5B as student) achieving 94.31% accuracy and 96.25% recall, demonstrating the value of closed-loop, error-driven refinement for edge threat detectors. The study also highlights the critical role of teacher LLM selection and reports practical implications for rapid, privacy-preserving deployment on consumer hardware, while noting limitations related to knowledge coverage and potential misuse.

Abstract

SMS-based phishing (smishing) attacks have surged, yet training effective on-device detectors requires labelled threat data that quickly becomes outdated. To deal with this issue, we present Agentic Knowledge Distillation, which consists of a powerful LLM acts as an autonomous teacher that fine-tunes a smaller student SLM, deployable for security tasks without human intervention. The teacher LLM autonomously generates synthetic data and iteratively refines a smaller on-device student model until performance plateaus. We compare four LLMs in this teacher role (Claude Opus 4.5, GPT 5.2 Codex, Gemini 3 Pro, and DeepSeek V3.2) on SMS spam/smishing detection with two student SLMs (Qwen2.5-0.5B and SmolLM2-135M). Our results show that performance varies substantially depending on the teacher LLM, with the best configuration achieving 94.31% accuracy and 96.25% recall. We also compare against a Direct Preference Optimisation (DPO) baseline that uses the same synthetic knowledge and LoRA setup but without iterative feedback or targeted refinement; agentic knowledge distillation substantially outperforms it (e.g. 86-94% vs 50-80% accuracy), showing that closed-loop feedback and targeted refinement are critical. These findings demonstrate that agentic knowledge distillation can rapidly yield effective security classifiers for edge deployment, but outcomes depend strongly on which teacher LLM is used.

Agentic Knowledge Distillation: Autonomous Training of Small Language Models for SMS Threat Detection

TL;DR

Abstract

Agentic Knowledge Distillation: Autonomous Training of Small Language Models for SMS Threat Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (1)