Exploring the Privacy Protection Capabilities of Chinese Large Language Models
Yuqi Yang, Xiaowen Huang, Jitao Sang
TL;DR
The paper addresses privacy protection in Chinese large language models by introducing a three-tiered privacy evaluation framework that examines general privacy knowledge, contextual privacy, and privacy under attack prompts. It applies this framework to four 6–7B Chinese LLMs with aligned chat capabilities, using zero-shot and few-shot prompts and attack templates to probe safeguards. The findings reveal pervasive privacy shortcomings across models, with performance deteriorating under few-shot settings and attack prompts, highlighting real-world privacy risks in deploying LLM-based services. The work emphasizes the need for stronger privacy-security alignment, robust defense strategies, and more realistic evaluation data, while acknowledging limitations such as synthetic datasets and the monotony of some prompt attacks.
Abstract
Large language models (LLMs), renowned for their impressive capabilities in various tasks, have significantly advanced artificial intelligence. Yet, these advancements have raised growing concerns about privacy and security implications. To address these issues and explain the risks inherent in these models, we have devised a three-tiered progressive framework tailored for evaluating privacy in language systems. This framework consists of progressively complex and in-depth privacy test tasks at each tier. Our primary objective is to comprehensively evaluate the sensitivity of large language models to private information, examining how effectively they discern, manage, and safeguard sensitive data in diverse scenarios. This systematic evaluation helps us understand the degree to which these models comply with privacy protection guidelines and the effectiveness of their inherent safeguards against privacy breaches. Our observations indicate that existing Chinese large language models universally show privacy protection shortcomings. It seems that at the moment this widespread issue is unavoidable and may pose corresponding privacy risks in applications based on these models.
