Table of Contents
Fetching ...

Can Large Language Models Really Recognize Your Name?

Dzung Pham, Peter Kairouz, Niloofar Mireshghallah, Eugene Bagdasarian, Chau Minh Pham, Amir Houmansadr

TL;DR

The paper demonstrates that large language models can fail to recognize personal names due to contextual ambiguity, revealing systematic privacy failure modes. By constructing AmBench, a benchmark exploiting Name Regularity Bias and Benign Prompt Injection, it shows 20–40% recall drop for ambiguous names and quadrupled leakage in abstractive summarization. Across PII detection and summarization tasks, the results highlight substantial privacy risks in relying solely on LLMs for privacy-preserving tasks and call for principled evaluation, auditing, and mitigations. The findings advocate for a taxonomy of failure modes and robust safeguards to accompany LLM-based privacy solutions in real-world deployments.

Abstract

Large language models (LLMs) are increasingly being used to protect sensitive user data. However, current LLM-based privacy solutions assume that these models can reliably detect personally identifiable information (PII), particularly named entities. In this paper, we challenge that assumption by revealing systematic failures in LLM-based privacy tasks. Specifically, we show that modern LLMs regularly overlook human names even in short text snippets due to ambiguous contexts, which cause the names to be misinterpreted or mishandled. We propose AMBENCH, a benchmark dataset of seemingly ambiguous human names, leveraging the name regularity bias phenomenon, embedded within concise text snippets along with benign prompt injections. Our experiments on modern LLMs tasked to detect PII as well as specialized tools show that recall of ambiguous names drops by 20--40% compared to more recognizable names. Furthermore, ambiguous human names are four times more likely to be ignored in supposedly privacy-preserving summaries generated by LLMs when benign prompt injections are present. These findings highlight the underexplored risks of relying solely on LLMs to safeguard user privacy and underscore the need for a more systematic investigation into their privacy failure modes.

Can Large Language Models Really Recognize Your Name?

TL;DR

The paper demonstrates that large language models can fail to recognize personal names due to contextual ambiguity, revealing systematic privacy failure modes. By constructing AmBench, a benchmark exploiting Name Regularity Bias and Benign Prompt Injection, it shows 20–40% recall drop for ambiguous names and quadrupled leakage in abstractive summarization. Across PII detection and summarization tasks, the results highlight substantial privacy risks in relying solely on LLMs for privacy-preserving tasks and call for principled evaluation, auditing, and mitigations. The findings advocate for a taxonomy of failure modes and robust safeguards to accompany LLM-based privacy solutions in real-world deployments.

Abstract

Large language models (LLMs) are increasingly being used to protect sensitive user data. However, current LLM-based privacy solutions assume that these models can reliably detect personally identifiable information (PII), particularly named entities. In this paper, we challenge that assumption by revealing systematic failures in LLM-based privacy tasks. Specifically, we show that modern LLMs regularly overlook human names even in short text snippets due to ambiguous contexts, which cause the names to be misinterpreted or mishandled. We propose AMBENCH, a benchmark dataset of seemingly ambiguous human names, leveraging the name regularity bias phenomenon, embedded within concise text snippets along with benign prompt injections. Our experiments on modern LLMs tasked to detect PII as well as specialized tools show that recall of ambiguous names drops by 20--40% compared to more recognizable names. Furthermore, ambiguous human names are four times more likely to be ignored in supposedly privacy-preserving summaries generated by LLMs when benign prompt injections are present. These findings highlight the underexplored risks of relying solely on LLMs to safeguard user privacy and underscore the need for a more systematic investigation into their privacy failure modes.

Paper Structure

This paper contains 37 sections, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Two failure cases where LLMs can confuse certain human names with non-human entities. The left side illustrates the NRB phenomenon in the task of PII type detection, where the LLM fails to understand that Italys is a woman even though the associated pronoun is she/her. The right side demonstrates the BPI phenomenon, where the LLM fails to distinguish between the application's instruction and the accidentally injected instruction in the user input, resulting in the human name Albanir being leaked in the model's summary.
  • Figure 2: Overview of the AmBench benchmark creation process. We create ambiguous text snippets by combining ambiguous human names that can be mistaken with non-human entities (left side) and ambiguous text templates synthesized by LLMs (right side).
  • Figure 3: Histograms of the consistency of human name detection for four representative methods (GPT-4o, DeepSeek R1, Llama 3.1 8B, and Flair). Each subfigure corresponds to a different human name type and plots the distribution of the ratio of human name classification for each name across the five templates. Takeaway: Most methods are inconsistent for at least 10% of names in all name types except for the baseline.