Table of Contents
Fetching ...

How Vulnerable Are Edge LLMs?

Ao Ding, Hongzong Li, Zi Liang, Zhanpeng Shi, Shuxin Zhuang, Shiqin Tang, Rong Feng, Ping Lu

Abstract

Large language models (LLMs) are increasingly deployed on edge devices under strict computation and quantization constraints, yet their security implications remain unclear. We study query-based knowledge extraction from quantized edge-deployed LLMs under realistic query budgets and show that, although quantization introduces noise, it does not remove the underlying semantic knowledge, allowing substantial behavioral recovery through carefully designed queries. To systematically analyze this risk, we propose \textbf{CLIQ} (\textbf{Cl}ustered \textbf{I}nstruction \textbf{Q}uerying), a structured query construction framework that improves semantic coverage while reducing redundancy. Experiments on quantized Qwen models (INT8/INT4) demonstrate that CLIQ consistently outperforms original queries across BERTScore, BLEU, and ROUGE, enabling more efficient extraction under limited budgets. These results indicate that quantization alone does not provide effective protection against query-based extraction, highlighting a previously underexplored security risk in edge-deployed LLMs.

How Vulnerable Are Edge LLMs?

Abstract

Large language models (LLMs) are increasingly deployed on edge devices under strict computation and quantization constraints, yet their security implications remain unclear. We study query-based knowledge extraction from quantized edge-deployed LLMs under realistic query budgets and show that, although quantization introduces noise, it does not remove the underlying semantic knowledge, allowing substantial behavioral recovery through carefully designed queries. To systematically analyze this risk, we propose \textbf{CLIQ} (\textbf{Cl}ustered \textbf{I}nstruction \textbf{Q}uerying), a structured query construction framework that improves semantic coverage while reducing redundancy. Experiments on quantized Qwen models (INT8/INT4) demonstrate that CLIQ consistently outperforms original queries across BERTScore, BLEU, and ROUGE, enabling more efficient extraction under limited budgets. These results indicate that quantization alone does not provide effective protection against query-based extraction, highlighting a previously underexplored security risk in edge-deployed LLMs.
Paper Structure (67 sections, 8 equations, 11 figures, 7 tables, 1 algorithm)

This paper contains 67 sections, 8 equations, 11 figures, 7 tables, 1 algorithm.

Figures (11)

  • Figure 1: Overview of the proposed framework for query-based knowledge extraction from edge-deployed quantized LLMs. Previous approaches (blue) rely on unstructured queries, which often lead to redundant probing and noisy responses, resulting in low-fidelity reconstruction of model behavior. CLIQ (red) performs structured query construction to produce more informative responses under a limited query budget, enabling high-fidelity student reconstruction of the edge model behavior.
  • Figure 2: Threat framework for query-based knowledge extraction from quantized edge-deployed LLMs. Traditional extraction settings (top) assume full-precision teacher models in high-performance server environments, where abundant compute allows large-scale query probing. In contrast, edge-deployed LLMs (bottom) operate under quantization (e.g., INT4/INT8) and strict resource constraints, resulting in limited query budgets and noisy responses. Nevertheless, carefully structured queries can still recover substantial behavioral knowledge from the quantized model.
  • Figure 3: Effect of training steps under a fixed query budget (500 queries) across three student models. We report BERT-F1, BLEU, and ROUGE-L, and include the 0-step base model as initialization. CLIQ (ours) consistently outperforms OQ and exhibits diminishing returns after $\sim$200--300 steps.
  • Figure 4: Effect of query budget on distillation performance. CLIQ exhibits rapid gains as the number of queries increases from 100 to 300, followed by diminishing returns, while Original Queries show minimal improvement.
  • Figure 5: Effect of training steps under a fixed query budget (500 queries). CLIQ(ours) yields consistent gains over OQ and shows diminishing returns after $\sim$200--300 steps.
  • ...and 6 more figures