Table of Contents
Fetching ...

Graph Theory for Consent Management: A New Approach for Complex Data Flows

Dorota Filipczuk, Enrico H. Gerding, George Konstantinidis

TL;DR

The paper tackles enforcing fine-grained user privacy constraints in large-scale data workflows by modeling data processing as a directed graph $G=(V,E)$ and formulating Consented Data Workflow (CDW): given constraint pairs $(s,t)$, identify a subgraph that disconnects these pairs while maximizing $U(G)=\sum_{p\in V^P} w_p u_p(G_p)$. It introduces a linearly additive instantiation where edge valuations propagate additively via $\pi(e)=\sum_{e'\in in(v)}\pi(e')$ and $u_p(G_p)=\sum_{e\in in(p)}\pi(e)$, then presents five algorithms (RemoveRandomEdge, RemoveFirstEdge, RemoveMinCuts, RemoveMinMC, BruteForce) with varying optimality and efficiency guarantees. Empirical results on synthetic graphs show BruteForce is intractable for large instances, while RemoveMinMC achieves near-optimal utilities for moderate constraint counts in reasonable time, and RemoveMinCuts offers a fast alternative with some loss in quality; performance degrades in denser graphs. The work outlines open problems in graph generation, richer valuation models, scalability, and extended privacy constraints, highlighting practical implications for consent-aware data flows in complex systems.

Abstract

Through legislation and technical advances users gain more control over how their data is processed, and they expect online services to respect their privacy choices and preferences. However, data may be processed for many different purposes by several layers of algorithms that create complex data workflows. To date, there is no existing approach to automatically satisfy fine-grained privacy constraints of a user in a way which optimises the service provider's gains from processing. In this article, we propose a solution to this problem by modelling a data flow as a graph. User constraints and processing purposes are pairs of vertices which need to be disconnected in this graph. In general, this problem is NP-hard, thus, we propose several heuristics and algorithms. We discuss the optimality versus efficiency of our algorithms and evaluate them using synthetically generated data. On the practical side, our algorithms can provide nearly optimal solutions for tens of constraints and graphs of thousands of nodes, in a few seconds.

Graph Theory for Consent Management: A New Approach for Complex Data Flows

TL;DR

The paper tackles enforcing fine-grained user privacy constraints in large-scale data workflows by modeling data processing as a directed graph and formulating Consented Data Workflow (CDW): given constraint pairs , identify a subgraph that disconnects these pairs while maximizing . It introduces a linearly additive instantiation where edge valuations propagate additively via and , then presents five algorithms (RemoveRandomEdge, RemoveFirstEdge, RemoveMinCuts, RemoveMinMC, BruteForce) with varying optimality and efficiency guarantees. Empirical results on synthetic graphs show BruteForce is intractable for large instances, while RemoveMinMC achieves near-optimal utilities for moderate constraint counts in reasonable time, and RemoveMinCuts offers a fast alternative with some loss in quality; performance degrades in denser graphs. The work outlines open problems in graph generation, richer valuation models, scalability, and extended privacy constraints, highlighting practical implications for consent-aware data flows in complex systems.

Abstract

Through legislation and technical advances users gain more control over how their data is processed, and they expect online services to respect their privacy choices and preferences. However, data may be processed for many different purposes by several layers of algorithms that create complex data workflows. To date, there is no existing approach to automatically satisfy fine-grained privacy constraints of a user in a way which optimises the service provider's gains from processing. In this article, we propose a solution to this problem by modelling a data flow as a graph. User constraints and processing purposes are pairs of vertices which need to be disconnected in this graph. In general, this problem is NP-hard, thus, we propose several heuristics and algorithms. We discuss the optimality versus efficiency of our algorithms and evaluate them using synthetically generated data. On the practical side, our algorithms can provide nearly optimal solutions for tens of constraints and graphs of thousands of nodes, in a few seconds.
Paper Structure (10 sections, 10 equations, 7 figures, 2 tables, 3 algorithms)

This paper contains 10 sections, 10 equations, 7 figures, 2 tables, 3 algorithms.

Figures (7)

  • Figure 1: A DFD created with Microsoft Threat Modelling Tool for an example product recommendation feature.
  • Figure 2: A data processing model where $V^U = \{ s_1, s_2 \}$, $V^A = \{ v_1 \}$, $V^P = \{ t_1, t_2 \}$ and $E = \{ (s_1, v_1), (s_2, v_1), (v_1, t_1), (v_1, t_2) \}$.
  • Figure 3: The number of constraints vs. the runtime of the algorithms in graphs from dataset 1.
  • Figure 4: The number of constraints vs. graph utility after applying the algorithms on graphs from dataset 1.
  • Figure 5: No. of paths vs. runtime and utility (dataset 1c).
  • ...and 2 more figures