Table of Contents
Fetching ...

SEAL-Tag: Self-Tag Evidence Aggregation with Probabilistic Circuits for PII-Safe Retrieval-Augmented Generation

Jin Xie, Songze Li, Guang Cheng

Abstract

Retrieval-Augmented Generation (RAG) systems introduce a critical vulnerability: contextual leakage, where adversaries exploit instruction-following to exfiltrate Personally Identifiable Information (PII) via adaptive extraction. Current defenses force a rigid trade-off between semantic utility and latency. We present SEAL-Tag, a privacy-preserving runtime environment that resolves this via a Verify-then-Route paradigm. SEAL-Tag introduces the SEAL-Probe protocol, transforming auditing into a structured tool-use operation where the model generates a verifiable PII-Evidence Table (PET) alongside its draft. To adjudicate this evidence, we employ a Probabilistic Circuit (PC) that enforces verifiable logical constraints for robust decision-making. To overcome the privacy "Cold Start" problem, we introduce the S0--S6 Anchored Synthesis Pipeline, generating high-fidelity, provenanced RAG interactions. We pair this with a Two-Stage Curriculum that first optimizes for entity detection before aligning the model to the rigorous audit protocol. Our evaluation demonstrates that SEAL-Tag establishes a new Pareto frontier, reducing adaptive leakage by over 8$\times$ while matching the utility and speed of unsafe baselines.

SEAL-Tag: Self-Tag Evidence Aggregation with Probabilistic Circuits for PII-Safe Retrieval-Augmented Generation

Abstract

Retrieval-Augmented Generation (RAG) systems introduce a critical vulnerability: contextual leakage, where adversaries exploit instruction-following to exfiltrate Personally Identifiable Information (PII) via adaptive extraction. Current defenses force a rigid trade-off between semantic utility and latency. We present SEAL-Tag, a privacy-preserving runtime environment that resolves this via a Verify-then-Route paradigm. SEAL-Tag introduces the SEAL-Probe protocol, transforming auditing into a structured tool-use operation where the model generates a verifiable PII-Evidence Table (PET) alongside its draft. To adjudicate this evidence, we employ a Probabilistic Circuit (PC) that enforces verifiable logical constraints for robust decision-making. To overcome the privacy "Cold Start" problem, we introduce the S0--S6 Anchored Synthesis Pipeline, generating high-fidelity, provenanced RAG interactions. We pair this with a Two-Stage Curriculum that first optimizes for entity detection before aligning the model to the rigorous audit protocol. Our evaluation demonstrates that SEAL-Tag establishes a new Pareto frontier, reducing adaptive leakage by over 8 while matching the utility and speed of unsafe baselines.
Paper Structure (28 sections, 10 equations, 5 figures, 5 tables)

This paper contains 28 sections, 10 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Overview of the Seal-Tag Framework.(Top) Post-Training Pipeline: We address the "cold start" problem of privacy training via an S0--S6 synthetic data generator, which fuels a two-stage curriculum learning process: first optimizing for PII Perception (Stage I), then aligning for Protocol Adherence (Stage II). (Bottom) Runtime Architecture: The system enforces a Verify-then-Route contract. The RAG backbone retrieves context containing potential PII. The LLM acts as a Seal-Probe, generating a Draft Answer followed by a structured PII-Evidence Table (PET) that explicitly maps entities, linkability risks, and consensus signals. This structured evidence is consumed by a Probabilistic Circuit (PC) decision head, which performs exact inference on the feature vector to deterministically route the output to Allow, Mask, or Refuse states.
  • Figure 2: The S0--S6 Anchored Synthesis Pipeline. The process begins with S0 (Anchoring), extracting and normalizing typed PII entities from curated sources. S1 (World Induction) and S2 (Atomic Enrichers) synthesize a plausible semantic backdrop and adversarial artifacts (e.g., phishing attempts) around these anchors. S3 (Context Composer) merges these elements into retrievable passages, deterministically tracking PII injection sites. S4 (Query & Draft) generates grounded user interactions. Finally, S5 (Finalize) and S6 (Review) construct the gold-standard <PET> and <FINAL> blocks, utilizing a Red-Team filter to retain only high-quality samples for the Instruction Tuning dataset.
  • Figure 3: The Privacy-Utility Pareto Frontier (Llama-3.2-3B). The x-axis represents Risk (Attack Success Rate), and the y-axis represents Utility (PopQA Accuracy). Curves represent the sensitivity sweep of decision thresholds.
  • Figure 4: Reliability Diagram (Calibration Plot). The x-axis represents the model's self-reported confidence that a response is Safe. The y-axis represents the actual percentage of Safe responses in that confidence bin. And ECE is Expected Calibration Error.
  • Figure 5: System Latency Overhead Comparison (Log Scale).