Table of Contents
Fetching ...

Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models

Alex Nyffenegger, Matthias Stürmer, Joel Niklaus

TL;DR

This study assesses whether Large Language Models can re-identify anonymized individuals in Swiss court rulings and related datasets. It builds three datasets (Court Decisions, Legal-News Linkage, and Wikipedia masked pages) and introduces PNMS, NLD, LNMS, and W-PNMS metrics, complemented by a Retrieval Augmented Generation workflow to probe multi-source evidence. The results show that vanilla LLMs struggle with re-identifying defendants in court rulings, while Wikipedia-based tests reveal non-trivial re-identification under retrieval and with larger models; three key factors—model size, input length, and instruction tuning—govern performance. The findings underscore the need for robust anonymization checks in courts and highlight privacy implications as LLMs become more capable and integrated with retrieval and tool-based workflows, informing policy and practice for safer publication of legal decisions.

Abstract

Anonymity of both natural and legal persons in court rulings is a critical aspect of privacy protection in the European Union and Switzerland. With the advent of LLMs, concerns about large-scale re-identification of anonymized persons are growing. In accordance with the Federal Supreme Court of Switzerland, we explore the potential of LLMs to re-identify individuals in court rulings by constructing a proof-of-concept using actual legal data from the Swiss federal supreme court. Following the initial experiment, we constructed an anonymized Wikipedia dataset as a more rigorous testing ground to further investigate the findings. With the introduction and application of the new task of re-identifying people in texts, we also introduce new metrics to measure performance. We systematically analyze the factors that influence successful re-identifications, identifying model size, input length, and instruction tuning among the most critical determinants. Despite high re-identification rates on Wikipedia, even the best LLMs struggled with court decisions. The complexity is attributed to the lack of test datasets, the necessity for substantial training resources, and data sparsity in the information used for re-identification. In conclusion, this study demonstrates that re-identification using LLMs may not be feasible for now, but as the proof-of-concept on Wikipedia showed, it might become possible in the future. We hope that our system can help enhance the confidence in the security of anonymized decisions, thus leading to the courts being more confident to publish decisions.

Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models

TL;DR

This study assesses whether Large Language Models can re-identify anonymized individuals in Swiss court rulings and related datasets. It builds three datasets (Court Decisions, Legal-News Linkage, and Wikipedia masked pages) and introduces PNMS, NLD, LNMS, and W-PNMS metrics, complemented by a Retrieval Augmented Generation workflow to probe multi-source evidence. The results show that vanilla LLMs struggle with re-identifying defendants in court rulings, while Wikipedia-based tests reveal non-trivial re-identification under retrieval and with larger models; three key factors—model size, input length, and instruction tuning—govern performance. The findings underscore the need for robust anonymization checks in courts and highlight privacy implications as LLMs become more capable and integrated with retrieval and tool-based workflows, informing policy and practice for safer publication of legal decisions.

Abstract

Anonymity of both natural and legal persons in court rulings is a critical aspect of privacy protection in the European Union and Switzerland. With the advent of LLMs, concerns about large-scale re-identification of anonymized persons are growing. In accordance with the Federal Supreme Court of Switzerland, we explore the potential of LLMs to re-identify individuals in court rulings by constructing a proof-of-concept using actual legal data from the Swiss federal supreme court. Following the initial experiment, we constructed an anonymized Wikipedia dataset as a more rigorous testing ground to further investigate the findings. With the introduction and application of the new task of re-identifying people in texts, we also introduce new metrics to measure performance. We systematically analyze the factors that influence successful re-identifications, identifying model size, input length, and instruction tuning among the most critical determinants. Despite high re-identification rates on Wikipedia, even the best LLMs struggled with court decisions. The complexity is attributed to the lack of test datasets, the necessity for substantial training resources, and data sparsity in the information used for re-identification. In conclusion, this study demonstrates that re-identification using LLMs may not be feasible for now, but as the proof-of-concept on Wikipedia showed, it might become possible in the future. We hope that our system can help enhance the confidence in the security of anonymized decisions, thus leading to the courts being more confident to publish decisions.
Paper Structure (41 sections, 30 figures, 5 tables)

This paper contains 41 sections, 30 figures, 5 tables.

Figures (30)

  • Figure 1: Re-identification framework
  • Figure 2: Simplified example of content in newspaper articles. Note that only using all three articles, the re-identification is made possible.
  • Figure 3: Prediction categories on rulings dataset. "good" are the only possibly correct predictions.
  • Figure 4: Re-identification score by parameter count
  • Figure 5: Re-Identification score across input lengths
  • ...and 25 more figures