Better Learning-Augmented Spanning Tree Algorithms via Metric Forest Completion

Nate Veldt; Thomas Stanley; Benjamin W. Priest; Trevor Steil; Keita Iwabuchi; T. S. Jayram; Grace J. Li; Geoffrey Sanders

Better Learning-Augmented Spanning Tree Algorithms via Metric Forest Completion

Nate Veldt, Thomas Stanley, Benjamin W. Priest, Trevor Steil, Keita Iwabuchi, T. S. Jayram, Grace J. Li, Geoffrey Sanders

TL;DR

Improved learning-augmented algorithms for finding an approximate minimum spanning tree (MST) for points in an arbitrary metric space are presented and a generalized method that interpolates between this prior algorithm and an optimal $\Omega(n^2)$-time MFC algorithm is introduced.

Abstract

We present improved learning-augmented algorithms for finding an approximate minimum spanning tree (MST) for points in an arbitrary metric space. Our work follows a recent framework called metric forest completion (MFC), where the learned input is a forest that must be given additional edges to form a full spanning tree. Veldt et al. (2025) showed that optimally completing the forest takes $Ω(n^2)$ time, but designed a 2.62-approximation for MFC with subquadratic complexity. The same method is a $(2γ+ 1)$-approximation for the original MST problem, where $γ\geq 1$ is a quality parameter for the initial forest. We introduce a generalized method that interpolates between this prior algorithm and an optimal $Ω(n^2)$-time MFC algorithm. Our approach considers only edges incident to a growing number of strategically chosen ``representative'' points. One corollary of our analysis is to improve the approximation factor of the previous algorithm from 2.62 for MFC and $(2γ+1)$ for metric MST to 2 and $2γ$ respectively. We prove this is tight for worst-case instances, but we still obtain better instance-specific approximations using our generalized method. We complement our theoretical results with a thorough experimental evaluation.

Better Learning-Augmented Spanning Tree Algorithms via Metric Forest Completion

TL;DR

-time MFC algorithm is introduced.

Abstract

time, but designed a 2.62-approximation for MFC with subquadratic complexity. The same method is a

-approximation for the original MST problem, where

is a quality parameter for the initial forest. We introduce a generalized method that interpolates between this prior algorithm and an optimal

-time MFC algorithm. Our approach considers only edges incident to a growing number of strategically chosen ``representative'' points. One corollary of our analysis is to improve the approximation factor of the previous algorithm from 2.62 for MFC and

for metric MST to 2 and

respectively. We prove this is tight for worst-case instances, but we still obtain better instance-specific approximations using our generalized method. We complement our theoretical results with a thorough experimental evaluation.

Paper Structure (12 sections, 4 theorems, 36 equations, 5 figures, 1 algorithm)

This paper contains 12 sections, 4 theorems, 36 equations, 5 figures, 1 algorithm.

Introduction
Preliminaries and Related Work
Multi-representative MFC Algorithm
Approximation analysis for fixed $R$.
The Best Representatives Problem
Algorithm variants and runtime analysis
Experiments
Conclusions and Discussion
Dynamic Programming for Representative Allocation
Details for algorithm variants and runtimes
Additional Experimental Details
Additional experimental results.

Key Result

Theorem 1

MultiRepMFC$(R)$ is an $\alpha$-approximation for MFC and an $(\alpha\gamma)$-approximation for metric MST where $\gamma$ is the overlap parameter for the initial forest and $\alpha= 1 + \textup{cost}(\mathcal{P},R)/w_{\mathcal{X}}(E_t)$.

Figures (5)

Figure 1: (a) The forest obtained by terminating Kruskal's algorithm early for a set of 100 points. (b) Running Kruskal's algorithm to the end leads to a full MST. (c) The initial forest can be viewed as a heuristic prediction for the forest in (a). For this example, $\gamma(\mathcal{P}) \approx 1.06$. (d) Solving metric forest completion problem produces a full spanning tree that approximates the true MST.
Figure 2: The MFC instance from Theorem \ref{['thm:tight']} when $\ell = 3$ and $p = 5$. Initial forest edges are black solid lines; these define $p$ components with $\ell + 1$ points each. Points with black centers are representatives. Two points have distance $\varepsilon$ if they are in the same gray enclosing region, otherwise they have distance 1. These distances can be realized by associating each point with a vector of length $p + \max\{\ell,p\}$ and using $\ell_\infty$ distance. The optimal spanning tree is achieved by adding the red dashed edges, which all have weight $\varepsilon$. MultiRepMFC only adds edges incident to representatives, and therefore completes the forest with $\ell -1$ edges of weight 1 (e.g., dotted blue edges).
Figure 3: We display the performance of each variant of MultiRepMFC as runtime increases. Each point corresponds to running one method with a fixed budget $b$. The top row shows the value of $\varepsilon$ such that a method obtains a $(1+\varepsilon)$-approximation in practice. The second row shows the value $\varepsilon_\alpha$ such that we can guarantee a $(1+\varepsilon_\alpha)$-approximation using Theorem \ref{['thm:costbound']}. Computing $\varepsilon_\alpha$ is fast. Computing $\varepsilon$ is impractical as it requires optimally solving MFC. The last row shows the gap between $\alpha$ and the true approximation as runtime increases. We see that all variants of MultiRepMFC provide a useful interpolation between the existing MFC-Approx algorithm ($b = 0$ vertical dashed line) and an optimal MFC algorithm (right vertical dashed line). All plots also show that dynamic programming produces better true approximations (top row), much better approximation bounds (middle row), and is faster at shrinking the gap between the bound and true approximation (last row). For Cooking, 16 random orderings of the entire dataset ($n = 39,774$) were used, for all others we take 16 uniform random samples of size $n = 30,000$. Average results are then displayed.
Figure 4: We display the performance of each variant of MultiRepMFC as budget increases. Each point corresponds to running one method with a fixed budget $b$.
Figure 5: For each dataset, we display the completion ratio: the weight of new edges added by MultiRepMFC, divided by the weight of the new edges added by an optimal solution for MFC. Adding a small number of extra representatives leads to an even more dramatic improvement to the completion ratio than to the MFC Cost ratio.

Theorems & Definitions (7)

Theorem 1
Corollary 2
proof
Theorem 3
proof
Theorem 4
proof

Better Learning-Augmented Spanning Tree Algorithms via Metric Forest Completion

TL;DR

Abstract

Better Learning-Augmented Spanning Tree Algorithms via Metric Forest Completion

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (7)