Table of Contents
Fetching ...

SpaLLM-Guard: Pairing SMS Spam Detection Using Open-source and Commercial LLMs

Muhammad Salman, Muhammad Ikram, Nardine Basta, Mohamed Ali Kaafar

TL;DR

SpaLLM-Guard benchmarks open-source and commercial LLMs for SMS spam detection across zero-shot, few-shot, chain-of-thought, and fine-tuning regimes. The study, built on the Super Dataset of 67,018 messages, shows zero-shot performance is unreliable while carefully crafted few-shot prompts offer model-dependent gains. Fine-tuning, especially Mixtral-8x7B via LoRA/QLoRA, delivers the strongest results (≈98.6% accuracy with minimal FPR/FNR) and substantially improves adversarial robustness and concept-drift resilience. Chain-of-thought prompting provides selective benefits for some models but is not universally beneficial, underscoring fine-tuning as the key strategy for robust, real-world SMS spam detection using LLMs. Overall, the work highlights practical pathways for deploying LLM-based spam detectors with strong robustness in dynamic threat environments.

Abstract

The increasing threat of SMS spam, driven by evolving adversarial techniques and concept drift, calls for more robust and adaptive detection methods. In this paper, we evaluate the potential of large language models (LLMs), both open-source and commercial, for SMS spam detection, comparing their performance across zero-shot, few-shot, fine-tuning, and chain-of-thought prompting approaches. Using a comprehensive dataset of SMS messages, we assess the spam detection capabilities of prominent LLMs such as GPT-4, DeepSeek, LLAMA-2, and Mixtral. Our findings reveal that while zero-shot learning provides convenience, it is unreliable for effective spam detection. Few-shot learning, particularly with carefully selected examples, improves detection but exhibits variability across models. Fine-tuning emerges as the most effective strategy, with Mixtral achieving 98.6% accuracy and a balanced false positive and false negative rate below 2%, meeting the criteria for robust spam detection. Furthermore, we explore the resilience of these models to adversarial attacks, finding that fine-tuning significantly enhances robustness against both perceptible and imperceptible manipulations. Lastly, we investigate the impact of concept drift and demonstrate that fine-tuned LLMs, especially when combined with few-shot learning, can mitigate its effects, maintaining high performance even on evolving spam datasets. This study highlights the importance of fine-tuning and tailored learning strategies to deploy LLMs effectively for real-world SMS spam detection

SpaLLM-Guard: Pairing SMS Spam Detection Using Open-source and Commercial LLMs

TL;DR

SpaLLM-Guard benchmarks open-source and commercial LLMs for SMS spam detection across zero-shot, few-shot, chain-of-thought, and fine-tuning regimes. The study, built on the Super Dataset of 67,018 messages, shows zero-shot performance is unreliable while carefully crafted few-shot prompts offer model-dependent gains. Fine-tuning, especially Mixtral-8x7B via LoRA/QLoRA, delivers the strongest results (≈98.6% accuracy with minimal FPR/FNR) and substantially improves adversarial robustness and concept-drift resilience. Chain-of-thought prompting provides selective benefits for some models but is not universally beneficial, underscoring fine-tuning as the key strategy for robust, real-world SMS spam detection using LLMs. Overall, the work highlights practical pathways for deploying LLM-based spam detectors with strong robustness in dynamic threat environments.

Abstract

The increasing threat of SMS spam, driven by evolving adversarial techniques and concept drift, calls for more robust and adaptive detection methods. In this paper, we evaluate the potential of large language models (LLMs), both open-source and commercial, for SMS spam detection, comparing their performance across zero-shot, few-shot, fine-tuning, and chain-of-thought prompting approaches. Using a comprehensive dataset of SMS messages, we assess the spam detection capabilities of prominent LLMs such as GPT-4, DeepSeek, LLAMA-2, and Mixtral. Our findings reveal that while zero-shot learning provides convenience, it is unreliable for effective spam detection. Few-shot learning, particularly with carefully selected examples, improves detection but exhibits variability across models. Fine-tuning emerges as the most effective strategy, with Mixtral achieving 98.6% accuracy and a balanced false positive and false negative rate below 2%, meeting the criteria for robust spam detection. Furthermore, we explore the resilience of these models to adversarial attacks, finding that fine-tuning significantly enhances robustness against both perceptible and imperceptible manipulations. Lastly, we investigate the impact of concept drift and demonstrate that fine-tuned LLMs, especially when combined with few-shot learning, can mitigate its effects, maintaining high performance even on evolving spam datasets. This study highlights the importance of fine-tuning and tailored learning strategies to deploy LLMs effectively for real-world SMS spam detection
Paper Structure (25 sections, 3 figures, 11 tables)

This paper contains 25 sections, 3 figures, 11 tables.

Figures (3)

  • Figure 1: Comparison of the highest and lowest accuracy of LLMS across prompts with highlighted differences.
  • Figure 2: Evaluation framework of LLMs using zero-shot, few-shot, fine-tuning, and chain-of-thought prompting approaches. The process includes training and testing on SMS datasets, with subsequent assessment through spam detection, adversarial evaluation, and concept drift evaluation.
  • Figure 3: Accuracy of LLMs across various learning approaches. The highest accuracy for each LLM is indicated by an arrow and labeled with the achieved accuracy. Abbreviations: bsn (baseline), Msg (messages), CoT (chain-of-thought)