Table of Contents
Fetching ...

Better Learning-Augmented Spanning Tree Algorithms via Metric Forest Completion

Nate Veldt, Thomas Stanley, Benjamin W. Priest, Trevor Steil, Keita Iwabuchi, T. S. Jayram, Grace J. Li, Geoffrey Sanders

TL;DR

Improved learning-augmented algorithms for finding an approximate minimum spanning tree (MST) for points in an arbitrary metric space are presented and a generalized method that interpolates between this prior algorithm and an optimal $\Omega(n^2)$-time MFC algorithm is introduced.

Abstract

We present improved learning-augmented algorithms for finding an approximate minimum spanning tree (MST) for points in an arbitrary metric space. Our work follows a recent framework called metric forest completion (MFC), where the learned input is a forest that must be given additional edges to form a full spanning tree. Veldt et al. (2025) showed that optimally completing the forest takes $Ω(n^2)$ time, but designed a 2.62-approximation for MFC with subquadratic complexity. The same method is a $(2γ+ 1)$-approximation for the original MST problem, where $γ\geq 1$ is a quality parameter for the initial forest. We introduce a generalized method that interpolates between this prior algorithm and an optimal $Ω(n^2)$-time MFC algorithm. Our approach considers only edges incident to a growing number of strategically chosen ``representative'' points. One corollary of our analysis is to improve the approximation factor of the previous algorithm from 2.62 for MFC and $(2γ+1)$ for metric MST to 2 and $2γ$ respectively. We prove this is tight for worst-case instances, but we still obtain better instance-specific approximations using our generalized method. We complement our theoretical results with a thorough experimental evaluation.

Better Learning-Augmented Spanning Tree Algorithms via Metric Forest Completion

TL;DR

Improved learning-augmented algorithms for finding an approximate minimum spanning tree (MST) for points in an arbitrary metric space are presented and a generalized method that interpolates between this prior algorithm and an optimal -time MFC algorithm is introduced.

Abstract

We present improved learning-augmented algorithms for finding an approximate minimum spanning tree (MST) for points in an arbitrary metric space. Our work follows a recent framework called metric forest completion (MFC), where the learned input is a forest that must be given additional edges to form a full spanning tree. Veldt et al. (2025) showed that optimally completing the forest takes time, but designed a 2.62-approximation for MFC with subquadratic complexity. The same method is a -approximation for the original MST problem, where is a quality parameter for the initial forest. We introduce a generalized method that interpolates between this prior algorithm and an optimal -time MFC algorithm. Our approach considers only edges incident to a growing number of strategically chosen ``representative'' points. One corollary of our analysis is to improve the approximation factor of the previous algorithm from 2.62 for MFC and for metric MST to 2 and respectively. We prove this is tight for worst-case instances, but we still obtain better instance-specific approximations using our generalized method. We complement our theoretical results with a thorough experimental evaluation.
Paper Structure (12 sections, 4 theorems, 36 equations, 5 figures, 1 algorithm)

This paper contains 12 sections, 4 theorems, 36 equations, 5 figures, 1 algorithm.

Key Result

Theorem 1

MultiRepMFC$(R)$ is an $\alpha$-approximation for MFC and an $(\alpha\gamma)$-approximation for metric MST where $\gamma$ is the overlap parameter for the initial forest and $\alpha= 1 + \textup{cost}(\mathcal{P},R)/w_{\mathcal{X}}(E_t)$.

Figures (5)

  • Figure 1: (a) The forest obtained by terminating Kruskal's algorithm early for a set of 100 points. (b) Running Kruskal's algorithm to the end leads to a full MST. (c) The initial forest can be viewed as a heuristic prediction for the forest in (a). For this example, $\gamma(\mathcal{P}) \approx 1.06$. (d) Solving metric forest completion problem produces a full spanning tree that approximates the true MST.
  • Figure 2: The MFC instance from Theorem \ref{['thm:tight']} when $\ell = 3$ and $p = 5$. Initial forest edges are black solid lines; these define $p$ components with $\ell + 1$ points each. Points with black centers are representatives. Two points have distance $\varepsilon$ if they are in the same gray enclosing region, otherwise they have distance 1. These distances can be realized by associating each point with a vector of length $p + \max\{\ell,p\}$ and using $\ell_\infty$ distance. The optimal spanning tree is achieved by adding the red dashed edges, which all have weight $\varepsilon$. MultiRepMFC only adds edges incident to representatives, and therefore completes the forest with $\ell -1$ edges of weight 1 (e.g., dotted blue edges).
  • Figure 3: We display the performance of each variant of MultiRepMFC as runtime increases. Each point corresponds to running one method with a fixed budget $b$. The top row shows the value of $\varepsilon$ such that a method obtains a $(1+\varepsilon)$-approximation in practice. The second row shows the value $\varepsilon_\alpha$ such that we can guarantee a $(1+\varepsilon_\alpha)$-approximation using Theorem \ref{['thm:costbound']}. Computing $\varepsilon_\alpha$ is fast. Computing $\varepsilon$ is impractical as it requires optimally solving MFC. The last row shows the gap between $\alpha$ and the true approximation as runtime increases. We see that all variants of MultiRepMFC provide a useful interpolation between the existing MFC-Approx algorithm ($b = 0$ vertical dashed line) and an optimal MFC algorithm (right vertical dashed line). All plots also show that dynamic programming produces better true approximations (top row), much better approximation bounds (middle row), and is faster at shrinking the gap between the bound and true approximation (last row). For Cooking, 16 random orderings of the entire dataset ($n = 39,774$) were used, for all others we take 16 uniform random samples of size $n = 30,000$. Average results are then displayed.
  • Figure 4: We display the performance of each variant of MultiRepMFC as budget increases. Each point corresponds to running one method with a fixed budget $b$.
  • Figure 5: For each dataset, we display the completion ratio: the weight of new edges added by MultiRepMFC, divided by the weight of the new edges added by an optimal solution for MFC. Adding a small number of extra representatives leads to an even more dramatic improvement to the completion ratio than to the MFC Cost ratio.

Theorems & Definitions (7)

  • Theorem 1
  • Corollary 2
  • proof
  • Theorem 3
  • proof
  • Theorem 4
  • proof