Optimal bounds on a tree inference algorithm
Jack Gardiner, Lachlan L. H. Andrew, Junhao Gan, Jean Honorio, Seeun William Umboh
TL;DR
The paper tightens the analysis of Hein's distance-based tree-inference algorithm, proving an optimal $O(n k \log_k n)$ query bound for trees with maximum degree $k$ and revealing cases where unbalanced trees admit $o(n k \log_k n)$ performance. It formalizes the tree-inference problem via leaf-leaf distances, introduces anchor calculations to place new leaves, and recasts complexity through rooted and unrooted recursion. By developing refined combinatorial bounds using $k$-ary and $g$-beanstalk structures, it shows the bound is tight in general and nearly optimal for several unbalanced classes, while contrasting with Brodal et al.'s $O(n k \log_k n)$ algorithm. The work highlights both asymptotic optimality and potential instance-optimality questions, and notes that implementation efficiency remains an open challenge with possible $O(n \log n)$ time solutions for fixed $k$. Overall, it advances understanding of the trade-offs between query richness and tree-imbalance in topology reconstruction from leaf distances.
Abstract
This paper tightens the best known analysis of Hein's 1989 algorithm to infer the topology of a weighted tree based on the lengths of paths between its leaves. It shows that the number of length queries required for a degree-$k$ tree of $n$ leaves is $O(n k \log_k n)$, which is the lower bound. It also presents a family of trees for which the performance is asymptotically better, and shows that no such family exists for a competing $O(n k \log_k n)$ algorithm.
