Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions
Dawen Zhang, Pamela Finckenberg-Broman, Thong Hoang, Shidong Pan, Zhenchang Xing, Mark Staples, Xiwei Xu
TL;DR
The paper analyzes how the Right to be Forgotten (RTBF) extends to Large Language Models (LLMs) in the GDPR era, highlighting data-provenance and deletion challenges that differ from traditional search engines. It compares LLMs to search engines, maps GDPR data-subject rights to LLM lifecycles, and surveys a taxonomy of solutions, including privacy-preserving ML, exact and approximate unlearning, model editing, and guardrails. The discussion emphasizes practical limitations and trade-offs, such as retraining costs, fairness impacts, and guardrail vulnerabilities, while advocating a combined legal-technical approach. Overall, the work offers a structured framework for practitioners and regulators to design compliant, auditable LLM-enabled systems while preserving user rights and safety.
Abstract
The Right to be Forgotten (RTBF) was first established as the result of the ruling of Google Spain SL, Google Inc. v AEPD, Mario Costeja González, and was later included as the Right to Erasure under the General Data Protection Regulation (GDPR) of European Union to allow individuals the right to request personal data be deleted by organizations. Specifically for search engines, individuals can send requests to organizations to exclude their information from the query results. It was a significant emergent right as the result of the evolution of technology. With the recent development of Large Language Models (LLMs) and their use in chatbots, LLM-enabled software systems have become popular. But they are not excluded from the RTBF. Compared with the indexing approach used by search engines, LLMs store, and process information in a completely different way. This poses new challenges for compliance with the RTBF. In this paper, we explore these challenges and provide our insights on how to implement technical solutions for the RTBF, including the use of differential privacy, machine unlearning, model editing, and guardrails. With the rapid advancement of AI and the increasing need of regulating this powerful technology, learning from the case of RTBF can provide valuable lessons for technical practitioners, legal experts, organizations, and authorities.
