Table of Contents
Fetching ...

Beyond the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models

Sibo Yi, Tianshuo Cong, Xinlei He, Qi Li, Jiaxing Song

TL;DR

This study investigates the security of small-language-models (SLMs) deployed on edge devices by conducting a large-scale empirical evaluation of 16 models (13 SLMs <4B and 3 LLMs >7B) against five jailbreak methods and multiple datasets, complemented by three defense strategies. It finds that most SLMs are vulnerable to jailbreak prompts, with higher attack success rates than direct prompts, though certain defenses can reduce risk to near-zero levels for many attacks. The analysis identifies key factors driving security degradation—safety alignment gaps, biased knowledge distillation, and compression techniques—while quantization sometimes preserves or even enhances robustness. The work offers practical guidance for building more robust, edge-friendly SLMs and informs defense design for real-world deployments.

Abstract

Small language models (SLMs) have become increasingly prominent in the deployment on edge devices due to their high efficiency and low computational cost. While researchers continue to advance the capabilities of SLMs through innovative training strategies and model compression techniques, the security risks of SLMs have received considerably less attention compared to large language models (LLMs).To fill this gap, we provide a comprehensive empirical study to evaluate the security performance of 13 state-of-the-art SLMs under various jailbreak attacks. Our experiments demonstrate that most SLMs are quite susceptible to existing jailbreak attacks, while some of them are even vulnerable to direct harmful prompts.To address the safety concerns, we evaluate several representative defense methods and demonstrate their effectiveness in enhancing the security of SLMs. We further analyze the potential security degradation caused by different SLM techniques including architecture compression, quantization, knowledge distillation, and so on. We expect that our research can highlight the security challenges of SLMs and provide valuable insights to future work in developing more robust and secure SLMs.

Beyond the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models

TL;DR

This study investigates the security of small-language-models (SLMs) deployed on edge devices by conducting a large-scale empirical evaluation of 16 models (13 SLMs <4B and 3 LLMs >7B) against five jailbreak methods and multiple datasets, complemented by three defense strategies. It finds that most SLMs are vulnerable to jailbreak prompts, with higher attack success rates than direct prompts, though certain defenses can reduce risk to near-zero levels for many attacks. The analysis identifies key factors driving security degradation—safety alignment gaps, biased knowledge distillation, and compression techniques—while quantization sometimes preserves or even enhances robustness. The work offers practical guidance for building more robust, edge-friendly SLMs and informs defense design for real-world deployments.

Abstract

Small language models (SLMs) have become increasingly prominent in the deployment on edge devices due to their high efficiency and low computational cost. While researchers continue to advance the capabilities of SLMs through innovative training strategies and model compression techniques, the security risks of SLMs have received considerably less attention compared to large language models (LLMs).To fill this gap, we provide a comprehensive empirical study to evaluate the security performance of 13 state-of-the-art SLMs under various jailbreak attacks. Our experiments demonstrate that most SLMs are quite susceptible to existing jailbreak attacks, while some of them are even vulnerable to direct harmful prompts.To address the safety concerns, we evaluate several representative defense methods and demonstrate their effectiveness in enhancing the security of SLMs. We further analyze the potential security degradation caused by different SLM techniques including architecture compression, quantization, knowledge distillation, and so on. We expect that our research can highlight the security challenges of SLMs and provide valuable insights to future work in developing more robust and secure SLMs.

Paper Structure

This paper contains 22 sections, 1 equation, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The family tree of the target SLMs we evaluate in our paper. The solid line represents the model is belonging to a certain family, while the dashed line indicates that the model is derived from that family with certain SLM technology.
  • Figure 2: The security performance of the 13 SLMs and 3 LLMs under direct attacks. The security performance of target models is ranked in descending order based on the average ASR.
  • Figure 3: The security performance of the 13 SLMs and 3 LLMs under jailbreak attacks. The security performance of target models is ranked in descending order based on the average ASR.
  • Figure 4: The security performance of SLMs in different parameter sizes under direct attacks. The security is measured by the average ASR of 5 harmful datasets.
  • Figure 5: The security performance of SLMs in different parameter sizes under jailbreak attacks. The security is measured by the average ASR of 5 jailbreak methods.
  • ...and 1 more figures