Beyond the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models

Sibo Yi; Tianshuo Cong; Xinlei He; Qi Li; Jiaxing Song

Beyond the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models

Sibo Yi, Tianshuo Cong, Xinlei He, Qi Li, Jiaxing Song

TL;DR

This study investigates the security of small-language-models (SLMs) deployed on edge devices by conducting a large-scale empirical evaluation of 16 models (13 SLMs <4B and 3 LLMs >7B) against five jailbreak methods and multiple datasets, complemented by three defense strategies. It finds that most SLMs are vulnerable to jailbreak prompts, with higher attack success rates than direct prompts, though certain defenses can reduce risk to near-zero levels for many attacks. The analysis identifies key factors driving security degradation—safety alignment gaps, biased knowledge distillation, and compression techniques—while quantization sometimes preserves or even enhances robustness. The work offers practical guidance for building more robust, edge-friendly SLMs and informs defense design for real-world deployments.

Abstract

Small language models (SLMs) have become increasingly prominent in the deployment on edge devices due to their high efficiency and low computational cost. While researchers continue to advance the capabilities of SLMs through innovative training strategies and model compression techniques, the security risks of SLMs have received considerably less attention compared to large language models (LLMs).To fill this gap, we provide a comprehensive empirical study to evaluate the security performance of 13 state-of-the-art SLMs under various jailbreak attacks. Our experiments demonstrate that most SLMs are quite susceptible to existing jailbreak attacks, while some of them are even vulnerable to direct harmful prompts.To address the safety concerns, we evaluate several representative defense methods and demonstrate their effectiveness in enhancing the security of SLMs. We further analyze the potential security degradation caused by different SLM techniques including architecture compression, quantization, knowledge distillation, and so on. We expect that our research can highlight the security challenges of SLMs and provide valuable insights to future work in developing more robust and secure SLMs.

Beyond the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models

TL;DR

Abstract

Beyond the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)