Table of Contents
Fetching ...

From Weak Cues to Real Identities: Evaluating Inference-Driven De-Anonymization in LLM Agents

Myeongseob Ko, Jihyun Jeong, Sumiran Singh Thakur, Gyuhak Kim, Ruoxi Jia

Abstract

Anonymization is widely treated as a practical safeguard because re-identifying anonymous records was historically costly, requiring domain expertise, tailored algorithms, and manual corroboration. We study a growing privacy risk that may weaken this barrier: LLM-based agents can autonomously reconstruct real-world identities from scattered, individually non-identifying cues. By combining these sparse cues with public information, agents resolve identities without bespoke engineering. We formalize this threat as \emph{inference-driven linkage} and systematically evaluate it across three settings: classical linkage scenarios (Netflix and AOL), \emph{InferLink} (a controlled benchmark varying task intent, shared cues, and attacker knowledge), and modern text-rich artifacts. Without task-specific heuristics, agents successfully execute both fixed-pool matching and open-ended identity resolution. In the Netflix Prize setting, an agent reconstructs 79.2\% of identities, significantly outperforming a 56.0\% classical baseline. Furthermore, linkage emerges not only under explicit adversarial prompts but also as a byproduct of benign cross-source analysis in \emph{InferLink} and unstructured research narratives. These findings establish that identity inference -- not merely explicit information disclosure -- must be treated as a first-class privacy risk; evaluations must measure what identities an agent can infer.

From Weak Cues to Real Identities: Evaluating Inference-Driven De-Anonymization in LLM Agents

Abstract

Anonymization is widely treated as a practical safeguard because re-identifying anonymous records was historically costly, requiring domain expertise, tailored algorithms, and manual corroboration. We study a growing privacy risk that may weaken this barrier: LLM-based agents can autonomously reconstruct real-world identities from scattered, individually non-identifying cues. By combining these sparse cues with public information, agents resolve identities without bespoke engineering. We formalize this threat as \emph{inference-driven linkage} and systematically evaluate it across three settings: classical linkage scenarios (Netflix and AOL), \emph{InferLink} (a controlled benchmark varying task intent, shared cues, and attacker knowledge), and modern text-rich artifacts. Without task-specific heuristics, agents successfully execute both fixed-pool matching and open-ended identity resolution. In the Netflix Prize setting, an agent reconstructs 79.2\% of identities, significantly outperforming a 56.0\% classical baseline. Furthermore, linkage emerges not only under explicit adversarial prompts but also as a byproduct of benign cross-source analysis in \emph{InferLink} and unstructured research narratives. These findings establish that identity inference -- not merely explicit information disclosure -- must be treated as a first-class privacy risk; evaluations must measure what identities an agent can infer.
Paper Structure (73 sections, 3 equations, 5 figures, 1 table)

This paper contains 73 sections, 3 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Overview of Inference-Driven Linkage. We show that an LLM agent can reconstruct a specific identity ($\hat{\imath}$) from fragmented information. (Left) Anonymized Artifacts ($D_{\text{anon}}$): Sources such as ChatGPT logs, AOL search histories, and interview transcripts contain fragmented, individually non-identifying cues that can jointly form an identity-relevant profile within a given artifact. (Right) Auxiliary Context ($D_{\text{aux}}$): Auxiliary information may be either provided directly or retrieved from public sources (e.g., the web, social media, and news), and serves as corroborating evidence. (Center) Agentic Inference: The agent combines weak cues from anonymized artifacts with corroborating auxiliary context to form a coherent identity hypothesis.
  • Figure 2: Classical linkage settings. (a) In the Netflix setting, LLM agents match or exceed the classical baseline, especially in sparse regimes. (b) In the AOL setting, the agent performs open-ended linkage by moving from anonymized queries ($D_{\text{anon}}$) to corroborating public evidence ($D_{\text{aux}}$) and ultimately to a specific identity hypothesis ($\hat{\imath}$).
  • Figure 3: The end-to-end pipeline of InferLink . Phase 1 specifies the seed $(f,\iota,\kappa)$, which defines the fingerprint type, intent, and attacker knowledge. This seed conditions Phase 2, plausible scenario generation, and Phase 3, synthesis of paired datasets $(D_{\text{anon}}, D_{\text{aux}})$ with a unique ground-truth linkage. Phase 4 executes a multi-turn task interaction, and Phase 5 evaluates privacy risk (LSR) and task utility.
  • Figure 4: Concrete example from InferLink for a single Intrinsic instance. The underlying paired-source data remain fixed, while the task framing changes across Implicit, Explicit-ZK, and Explicit-MK.
  • Figure 5: Modern digital trace examples of inference-driven linkage. (a) Anthropic Interviewer. A redacted interview provides the anonymized artifact $D_{\text{anon}}$, containing technical and contextual cues without direct identifiers. The agent extracts a distinctive academic profile, retrieves public corroboration ($D_{\text{aux}}$), and forms an identity hypothesis $\hat{\imath}$ with supporting evidence $\mathcal{E}$. (b) ChatGPT log. From the anonymized conversation ($D_{\text{anon}}$), the agent extracts a coarse profile from fragmented contextual cues, retrieves corroborating public information ($D_{\text{aux}}$), and narrows to a specific identity hypothesis $\hat{\imath}$.