Table of Contents
Fetching ...

TracE2E: Easily Deployable Middleware for Decentralized Data Traceability

Daniel Pressensé, Elisavet Kozyri

TL;DR

The paper addresses the need for explainability and regulatory compliance in distributed data processing by proposing TracE2E, a Rust-based middleware that records provenance and enforces data-protection policies. It achieves end-to-end traceability across multiple nodes by wrapping the Rust IO library to mediate inputs and outputs, while supporting a modular compliance layer built on top of provenance. Key contributions include a decentralized provenance layer with a synchronous, atomically recorded data-flow protocol (P2M and M2M), a global resource identification system, a labeled provenance/compliance structure, and a gRPC-based interface for coordination. The evaluation demonstrates that the framework enforces local confidentiality and integrity policies, albeit with measurable I/O overhead that scales with small I/O operations, illustrating the practical viability and trade-offs of decentralized traceability in real-world systems.

Abstract

This paper presents TracE2E, a middleware written in Rust, that can provide both data explainability and compliance across multiple nodes. By mediating inputs and outputs of processes, TracE2E records provenance information and enforces data-protection policies (e.g., confidentiality, integrity) that depend on the recorded provenance. Unlike existing approaches that necessitate substantial application modifications, TracE2E is designed for easy integration into existing and future applications through a wrapper of the Rust standard library's IO module. We describe how TracE2E consistently records provenance information across nodes, and we demonstrate how the compliance layer of TracE2E can accommodate the enforcement of multiple policies.

TracE2E: Easily Deployable Middleware for Decentralized Data Traceability

TL;DR

The paper addresses the need for explainability and regulatory compliance in distributed data processing by proposing TracE2E, a Rust-based middleware that records provenance and enforces data-protection policies. It achieves end-to-end traceability across multiple nodes by wrapping the Rust IO library to mediate inputs and outputs, while supporting a modular compliance layer built on top of provenance. Key contributions include a decentralized provenance layer with a synchronous, atomically recorded data-flow protocol (P2M and M2M), a global resource identification system, a labeled provenance/compliance structure, and a gRPC-based interface for coordination. The evaluation demonstrates that the framework enforces local confidentiality and integrity policies, albeit with measurable I/O overhead that scales with small I/O operations, illustrating the practical viability and trade-offs of decentralized traceability in real-world systems.

Abstract

This paper presents TracE2E, a middleware written in Rust, that can provide both data explainability and compliance across multiple nodes. By mediating inputs and outputs of processes, TracE2E records provenance information and enforces data-protection policies (e.g., confidentiality, integrity) that depend on the recorded provenance. Unlike existing approaches that necessitate substantial application modifications, TracE2E is designed for easy integration into existing and future applications through a wrapper of the Rust standard library's IO module. We describe how TracE2E consistently records provenance information across nodes, and we demonstrate how the compliance layer of TracE2E can accommodate the enforcement of multiple policies.

Paper Structure

This paper contains 21 sections, 11 figures.

Figures (11)

  • Figure 1: Overview of the TracE2E design
  • Figure 2: Processes to Middleware communication protocol (P2M), blue box represents the read reservation on source (Process P) and yellow one the write reservation on destination (File F) of the data flow
  • Figure 3: Middleware to Middleware (M2M) communication protocol enabling provenance recording in decentralized context. Blue and yellow boxes represent respectively the read and the write reservation of the resources involved in the data flows
  • Figure 4: Custom I/O library activity. The grey boxes represent the execution of standard library components, and the blues ones the interaction checkpoints with the middleware.
  • Figure 5: Traceability management stages. Custom I/O library handles blue boxes, while middleware manages green boxes. Solid arrows show P2M protocol communications, while dashed arrows represent internal middleware communications.
  • ...and 6 more figures