Joint Optimization of Prompt Security and System Performance in Edge-Cloud LLM Systems
Haiyang Huang, Tianhui Meng, Weijia Jia
TL;DR
This work tackles prompt-security in edge-cloud LLM systems by integrating a vector-database-enabled detector with edge deployment and formulating a joint detection-latency-resource optimization as a multi-stage Bayesian game. It introduces an architecture (EC-LLM) and a Game Model-based Detection Resource Allocation (GMDRA) framework that uses belief updates and sequential marginal analysis to allocate detection effort while minimizing benign-user latency and malicious-user/resource costs. The approach is validated on a real EC-LLM testbed using Qwen 1.5-7B-Chat, Milvus VDBs, and a Bert-based detector, showing improved security and reduced latency and GPU resource usage compared with cloud-only and detector-absent baselines. The results demonstrate practical impact for deploying secure, efficient LLM services at the edge with scalable defense against prompt attacks.
Abstract
Large language models (LLMs) have significantly facilitated human life, and prompt engineering has improved the efficiency of these models. However, recent years have witnessed a rise in prompt engineering-empowered attacks, leading to issues such as privacy leaks, increased latency, and system resource wastage. Though safety fine-tuning based methods with Reinforcement Learning from Human Feedback (RLHF) are proposed to align the LLMs, existing security mechanisms fail to cope with fickle prompt attacks, highlighting the necessity of performing security detection on prompts. In this paper, we jointly consider prompt security, service latency, and system resource optimization in Edge-Cloud LLM (EC-LLM) systems under various prompt attacks. To enhance prompt security, a vector-database-enabled lightweight attack detector is proposed. We formalize the problem of joint prompt detection, latency, and resource optimization into a multi-stage dynamic Bayesian game model. The equilibrium strategy is determined by predicting the number of malicious tasks and updating beliefs at each stage through Bayesian updates. The proposed scheme is evaluated on a real implemented EC-LLM system, and the results demonstrate that our approach offers enhanced security, reduces the service latency for benign users, and decreases system resource consumption compared to state-of-the-art algorithms.
