Table of Contents
Fetching ...

Enabling Personal Dataflow Sovereignty via Bolt-on Data Escrow

Zhiru Zhu, Raul Castro Fernandez

TL;DR

This work tackles the lack of transparency and control in personal data usage by introducing a bolt-on data escrow that enables delegated computation inside the individual's trust zone. Data access is declaratively specified via a run(access(), compute()) interface, while the escrow enforces purpose-based data use and supports on-device or escrow-controlled server execution, ensuring that raw data $D_s$ never leaves the trust zone and only the result $D_t$ is exposed. The authors implement this architecture in the Apple ecosystem, designing a relational data virtualization layer and evaluating three relational-engine strategies—Materialized Tables, Virtual Tables, and Virtual Tables with Pushdown—alongside an end-to-end offloading pipeline with strong security guarantees. Qualitative and quantitative evaluations show that the escrow can express real-world dataflows with negligible overhead, especially when employing predicate pushdown, supporting broad applicability toward practical personal dataflow sovereignty.

Abstract

The digital economy is powered by a continuous and massive exchange of personal data. Individuals provide data to platforms in return for services, from social networking and search to health monitoring, entertainment, and access to LLMs. This exchange has created immense value, but it has also established a fundamental asymmetry of power: individuals possess only coarse-grained control over data access rather than fine-grained control over its purpose of use, creating a gap where data can be repurposed for undisclosed uses, e.g., platforms selling the data to data brokers, which results in a critical loss of personal data sovereignty. This paper reframes this socio-technical challenge as a dataflow management problem. We propose a bolt-on data escrow architecture through delegated computation. In our model, instead of data flowing to platforms, platforms delegate their computation to a trustworthy escrow. This inversion empowers individuals with transparency and control over their dataflows. We present four contributions: (1) a dataflow model that explicitly incorporates computational purpose as a first-class primitive; (2) a minimally invasive programming interface, run(access(), compute()), built on a unified relational interface that virtualizes on-device data sources and a computation offloading component; (3) a concrete implementation of our escrow within the Apple ecosystem, demonstrating its practicality; and (4) both qualitative and quantitative evaluations demonstrating that our solution is expressive enough to implement a wide range of dataflows from real-world applications and introduces minimal runtime overhead. In summary, our work serves as a stepping stone toward achieving personal dataflow sovereignty.

Enabling Personal Dataflow Sovereignty via Bolt-on Data Escrow

TL;DR

This work tackles the lack of transparency and control in personal data usage by introducing a bolt-on data escrow that enables delegated computation inside the individual's trust zone. Data access is declaratively specified via a run(access(), compute()) interface, while the escrow enforces purpose-based data use and supports on-device or escrow-controlled server execution, ensuring that raw data never leaves the trust zone and only the result is exposed. The authors implement this architecture in the Apple ecosystem, designing a relational data virtualization layer and evaluating three relational-engine strategies—Materialized Tables, Virtual Tables, and Virtual Tables with Pushdown—alongside an end-to-end offloading pipeline with strong security guarantees. Qualitative and quantitative evaluations show that the escrow can express real-world dataflows with negligible overhead, especially when employing predicate pushdown, supporting broad applicability toward practical personal dataflow sovereignty.

Abstract

The digital economy is powered by a continuous and massive exchange of personal data. Individuals provide data to platforms in return for services, from social networking and search to health monitoring, entertainment, and access to LLMs. This exchange has created immense value, but it has also established a fundamental asymmetry of power: individuals possess only coarse-grained control over data access rather than fine-grained control over its purpose of use, creating a gap where data can be repurposed for undisclosed uses, e.g., platforms selling the data to data brokers, which results in a critical loss of personal data sovereignty. This paper reframes this socio-technical challenge as a dataflow management problem. We propose a bolt-on data escrow architecture through delegated computation. In our model, instead of data flowing to platforms, platforms delegate their computation to a trustworthy escrow. This inversion empowers individuals with transparency and control over their dataflows. We present four contributions: (1) a dataflow model that explicitly incorporates computational purpose as a first-class primitive; (2) a minimally invasive programming interface, run(access(), compute()), built on a unified relational interface that virtualizes on-device data sources and a computation offloading component; (3) a concrete implementation of our escrow within the Apple ecosystem, demonstrating its practicality; and (4) both qualitative and quantitative evaluations demonstrating that our solution is expressive enough to implement a wide range of dataflows from real-world applications and introduces minimal runtime overhead. In summary, our work serves as a stepping stone toward achieving personal dataflow sovereignty.
Paper Structure (31 sections, 4 figures, 5 tables)

This paper contains 31 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Dataflows in today's app ecosystem vs with the data escrow intermediary
  • Figure 2: Accessing location using native SDKs (left) vs using the escrow's relational interface (right).
  • Figure 3: Average access and compute runtime (with std) by escrow-based app vs baseline app over 10 runs
  • Figure 4: Average runtime (with std) to execute queries over 10 runs using Materialized Tables (MT), Virtual Tables (VT), and Virtual Tables with Pushdown (VTP)

Theorems & Definitions (1)

  • Definition 1: Dataflow