Table of Contents
Fetching ...

Faster Private Minimum Spanning Trees

Rasmus Pagh, Lukas Retschmeier

TL;DR

This work addresses privately releasing a minimum spanning tree when edge weights are private under the $\ell_∞$-neighboring model, achieving $O(m + n^{3/2}\log n / \sqrt{\rho})$ runtime and $O\left(n^{3/2}\log(n)\,\Delta_∞ / \sqrt{\rho}\right)$ MST error with high probability. It introduces Fast-PAMST, an in-place private MST algorithm that simulates Report-Noisy-Max efficiently by discretizing weights to multiples of $\Delta_∞$, grouping edges of identical discretized weights, and employing a specialized priority queue to support fast private edge selections within Prim-Jarník’s framework. The main technical contributions include discretized RNM, grouped-top sampling via MaxExp, bottom-edge noise handling, and a four-layer sqrt-decomposition data structure, collectively delivering a running time of $O(m + n^{3/2}\log n / \sqrt{\rho})$ for dense graphs and optimal asymptotic error bounds $O\left(n^{3/2}\log(n)\,\Delta_∞ / \sqrt{\rho}\right)$. Empirical results corroborate the theoretical claims, showing substantial speedups over prior post-processing and PAMST approaches while maintaining tight private MST utility. The proposed approach broadens the practicality of privacy-preserving MSTs for clustering and synthetic data generation, and opens avenues for extending to sparse graphs and other $\ell_p$ privacy settings, as well as related tasks like Chow-Liu trees.

Abstract

Motivated by applications in clustering and synthetic data generation, we consider the problem of releasing a minimum spanning tree (MST) under edge-weight differential privacy constraints where a graph topology $G=(V,E)$ with $n$ vertices and $m$ edges is public, the weight matrix $\vec{W}\in \mathbb{R}^{n \times n}$ is private, and we wish to release an approximate MST under $ρ$-zero-concentrated differential privacy. Weight matrices are considered neighboring if they differ by at most $Δ_\infty$ in each entry, i.e., we consider an $\ell_\infty$ neighboring relationship. Existing private MST algorithms either add noise to each entry in $\vec{W}$ and estimate the MST by post-processing or add noise to weights in-place during the execution of a specific MST algorithm. Using the post-processing approach with an efficient MST algorithm takes $O(n^2)$ time on dense graphs but results in an additive error on the weight of the MST of magnitude $O(n^2\log n)$. In-place algorithms give asymptotically better utility, but the running time of existing in-place algorithms is $O(n^3)$ for dense graphs. Our main result is a new differentially private MST algorithm that matches the utility of existing in-place methods while running in time $O(m + n^{3/2}\log n)$ for fixed privacy parameter $ρ$. The technical core of our algorithm is an efficient sublinear time simulation of Report-Noisy-Max that works by discretizing all edge weights to a multiple of $Δ_\infty$ and forming groups of edges with identical weights. Specifically, we present a data structure that allows us to sample a noisy minimum weight edge among at most $O(n^2)$ cut edges in $O(\sqrt{n} \log n)$ time. Experimental evaluations support our claims that our algorithm significantly improves previous algorithms either in utility or running time.

Faster Private Minimum Spanning Trees

TL;DR

This work addresses privately releasing a minimum spanning tree when edge weights are private under the -neighboring model, achieving runtime and MST error with high probability. It introduces Fast-PAMST, an in-place private MST algorithm that simulates Report-Noisy-Max efficiently by discretizing weights to multiples of , grouping edges of identical discretized weights, and employing a specialized priority queue to support fast private edge selections within Prim-Jarník’s framework. The main technical contributions include discretized RNM, grouped-top sampling via MaxExp, bottom-edge noise handling, and a four-layer sqrt-decomposition data structure, collectively delivering a running time of for dense graphs and optimal asymptotic error bounds . Empirical results corroborate the theoretical claims, showing substantial speedups over prior post-processing and PAMST approaches while maintaining tight private MST utility. The proposed approach broadens the practicality of privacy-preserving MSTs for clustering and synthetic data generation, and opens avenues for extending to sparse graphs and other privacy settings, as well as related tasks like Chow-Liu trees.

Abstract

Motivated by applications in clustering and synthetic data generation, we consider the problem of releasing a minimum spanning tree (MST) under edge-weight differential privacy constraints where a graph topology with vertices and edges is public, the weight matrix is private, and we wish to release an approximate MST under -zero-concentrated differential privacy. Weight matrices are considered neighboring if they differ by at most in each entry, i.e., we consider an neighboring relationship. Existing private MST algorithms either add noise to each entry in and estimate the MST by post-processing or add noise to weights in-place during the execution of a specific MST algorithm. Using the post-processing approach with an efficient MST algorithm takes time on dense graphs but results in an additive error on the weight of the MST of magnitude . In-place algorithms give asymptotically better utility, but the running time of existing in-place algorithms is for dense graphs. Our main result is a new differentially private MST algorithm that matches the utility of existing in-place methods while running in time for fixed privacy parameter . The technical core of our algorithm is an efficient sublinear time simulation of Report-Noisy-Max that works by discretizing all edge weights to a multiple of and forming groups of edges with identical weights. Specifically, we present a data structure that allows us to sample a noisy minimum weight edge among at most cut edges in time. Experimental evaluations support our claims that our algorithm significantly improves previous algorithms either in utility or running time.
Paper Structure (28 sections, 14 theorems, 13 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 28 sections, 14 theorems, 13 equations, 4 figures, 1 table, 2 algorithms.

Key Result

Theorem 1.1

There exists a $\mathcal{O}(m + n^{3/2}\log(n) /\sqrt{\rho})$-time $\rho$-zCDP algorithm that, given a public graph topology $G=(V, E)$ with $n$ vertices and $m$ edges together with private weights $\mathbf{W}$ and sensitivity $\Delta_\infty$, releases the edges of a spanning tree whose weight diffe

Figures (4)

  • Figure 1: Our fast RNM simulation. Balls on a horizontal line have been discretized to the same value and form a group $F_i$. For every edge below the threshold $M$ away from the maximum (denoted as the set $L$), we first sample the number of noise $z$ terms exceeding $M$ from ${\tt Bin}(|L|, e^{-\lambda M})$, then conditionally sample $k$ noise terms from ${\tt Exp}(\lambda)_{|\geq M}$ and add them to the edges of random subset of size $k$ We uniformly select an edge for each top group $F_i$ and add noise drawn from $\tt MaxExp(|F_i|, \lambda)$. The lightgreenlight green elements are the only ones we have to sample noise for. This example returns the darkgreendark green edge.
  • Figure 2: Visualization of our data structure for storing the discretized edges. $L$ contains all edges sorted by their (discretized) weight. All $F_i$'s point to an interval in $L$, where all its edges are stored. Inserted edges are stored from the left inside each interval, and if a new one gets inserted or deleted, we swap them locally and update the corresponding counters. To find them in constant time, we need an additional dictionary $h:E\rightarrow \mathbb{N}$, which stores the current index in $L$ for each particular edge. Using a sqrt-decompositioncpalgorithmsSqrtDecomposition, we bundle $\sqrt{l}$ many groups and enable fast search by adding $\sqrt{l}$ many blocks. This allows to find the maximum in $\mathcal{O}(\sqrt{l})$ and to get the tighter Fast-PAMST running time of $\mathcal{O}(m + \log n \sqrt{n}/\sqrt{\rho})$\ref{['thm:main']}, we need a more general data structure with four layers each holding $\sqrt[4]{l}$ elements.
  • Figure 3: Experiments on a complete graph with $n$ vertices for a a fixed privacy parameter $\rho = 0.1$ where each $w_e \sim \tt Uni(0,1)$. Each data point is computed from a median of five runs (spanning the area around the curve) and runs on a MacBook Pro (16GB RAM, M2 Pro).
  • Figure 4: Experiments on a complete graph with $30 \leq n \leq 5000$ vertices for a a fixed privacy parameter $\rho = 0.1$ where each $w_e \sim \tt Uni(0,1)$. Each data point is computed from a median of five runs (spanning the area around the curve) and ran on a MacBook Pro (16GB RAM, M2 Pro).

Theorems & Definitions (28)

  • Theorem 1.1
  • Definition 2.1: $l_p$-neighboring graphs
  • Lemma 2.2
  • proof
  • Corollary 2.2: Sealfon_2016 MST Error with Laplace Mechanism
  • Corollary 2.2: Sealfon_2016 MST Error for Gaussian Mechanism
  • Corollary 3.0: Discretized-RNM
  • proof
  • Corollary 3.1: Report-Noisy-Grouped-Max
  • proof
  • ...and 18 more