Table of Contents
Fetching ...

Puda: Private User Dataset Agent for User-Sovereign and Privacy-Preserving Personalized AI

Akinori Maeda, Yuto Sekiya, Sota Sugimura, Tomoya Asai, Yu Tsuda, Kohei Ikeda, Hiroshi Fujii, Kohei Watanabe

TL;DR

The paper addresses the privacy-utility tension in personalized AI by proposing Puda, a browser-based, user-sovereign architecture that aggregates cross-service data under user-controlled privacy levels. It introduces three privacy granularities—Detailed Browsing History, Extracted Keywords, and Predefined Category Subsets—and demonstrates that deterministic category-based subsets can nearly match the strongest privacy baseline in personalization performance, while reducing leakage risk. The architecture relies on three components (Content Recorder, Dataset Agent, Access Control Agent) and standard authorization protocols, and is evaluated on a travel-planner task using an LLM-as-Judge framework to assess personalization and practical costs. The findings show a viable trade-off space where Category Level 3 achieves 97.2% of Browsing History performance with favorable latency and token usage, suggesting that user-driven, multi-granular privacy can empower private, AI-native personalization at scale.

Abstract

Personal data centralization among dominant platform providers including search engines, social networking services, and e-commerce has created siloed ecosystems that restrict user sovereignty, thereby impeding data use across services. Meanwhile, the rapid proliferation of Large Language Model (LLM)-based agents has intensified demand for highly personalized services that require the dynamic provision of diverse personal data. This presents a significant challenge: balancing the utilization of such data with privacy protection. To address this challenge, we propose Puda (Private User Dataset Agent), a user-sovereign architecture that aggregates data across services and enables client-side management. Puda allows users to control data sharing at three privacy levels: (i) Detailed Browsing History, (ii) Extracted Keywords, and (iii) Predefined Category Subsets. We implemented Puda as a browser-based system that serves as a common platform across diverse services and evaluated it through a personalized travel planning task. Our results show that providing Predefined Category Subsets achieves 97.2% of the personalization performance (evaluated via an LLM-as-a-Judge framework across three criteria) obtained when sharing Detailed Browsing History. These findings demonstrate that Puda enables effective multi-granularity management, offering practical choices to mitigate the privacy-personalization trade-off. Overall, Puda provides an AI-native foundation for user sovereignty, empowering users to safely leverage the full potential of personalized AI.

Puda: Private User Dataset Agent for User-Sovereign and Privacy-Preserving Personalized AI

TL;DR

The paper addresses the privacy-utility tension in personalized AI by proposing Puda, a browser-based, user-sovereign architecture that aggregates cross-service data under user-controlled privacy levels. It introduces three privacy granularities—Detailed Browsing History, Extracted Keywords, and Predefined Category Subsets—and demonstrates that deterministic category-based subsets can nearly match the strongest privacy baseline in personalization performance, while reducing leakage risk. The architecture relies on three components (Content Recorder, Dataset Agent, Access Control Agent) and standard authorization protocols, and is evaluated on a travel-planner task using an LLM-as-Judge framework to assess personalization and practical costs. The findings show a viable trade-off space where Category Level 3 achieves 97.2% of Browsing History performance with favorable latency and token usage, suggesting that user-driven, multi-granular privacy can empower private, AI-native personalization at scale.

Abstract

Personal data centralization among dominant platform providers including search engines, social networking services, and e-commerce has created siloed ecosystems that restrict user sovereignty, thereby impeding data use across services. Meanwhile, the rapid proliferation of Large Language Model (LLM)-based agents has intensified demand for highly personalized services that require the dynamic provision of diverse personal data. This presents a significant challenge: balancing the utilization of such data with privacy protection. To address this challenge, we propose Puda (Private User Dataset Agent), a user-sovereign architecture that aggregates data across services and enables client-side management. Puda allows users to control data sharing at three privacy levels: (i) Detailed Browsing History, (ii) Extracted Keywords, and (iii) Predefined Category Subsets. We implemented Puda as a browser-based system that serves as a common platform across diverse services and evaluated it through a personalized travel planning task. Our results show that providing Predefined Category Subsets achieves 97.2% of the personalization performance (evaluated via an LLM-as-a-Judge framework across three criteria) obtained when sharing Detailed Browsing History. These findings demonstrate that Puda enables effective multi-granularity management, offering practical choices to mitigate the privacy-personalization trade-off. Overall, Puda provides an AI-native foundation for user sovereignty, empowering users to safely leverage the full potential of personalized AI.
Paper Structure (28 sections, 5 figures, 2 tables)

This paper contains 28 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The Content Recorder collects users’ cross-service online activities. The Dataset Agent transforms this data into multi-granular datasets at different privacy levels. The Access Control Agent ensures that data shared with external services is scoped to the granularity authorized by the user.
  • Figure 2: Data processing flow within the Dataset Agent transforming user history data into multi-granular privacy levels. The left side depicts the per-page processing, while the right side illustrates the per-user processing, which aggregates the page-level data into user-specific datasets.
  • Figure 3: The User initiates a travel planning request via the App Frontend. The Orchestrator delegates this task to the Travel Planner Agent. The Travel Planner Agent executes the task by collaborating with internal Sub Agents and Tools. Personal data managed by Puda is provisioned to the Travel Planner Agent via the A2A protocol.
  • Figure 4: Japanese user interface of the Travel Planner Agent. Users request a travel plan via the chat interface on the right. The left panel displays three proposed candidate destinations, while Points of Interest (POIs) are presented in the center. In this figure, a winery in Yamanashi is highlighted.
  • Figure 5: Latency and input/output token consumption during Travel Planner Agent inference under each personal data provision condition. The horizontal axis represents the mean score of the three-personalization metrics. All three scatter plots exhibit a similar trend, indicating that data with lower privacy protection levels tends to incur higher costs.