Scenario-Based Robust Optimization of Tree Structures
Spyros Angelopoulos, Christoph Dürr, Alex Elenter, Georgii Melidi
TL;DR
We address robust design of BSTs and Huffman trees under $k$ frequency-scenario vectors, aiming for a single data-structure that performs well across all possible scenarios. The paper establishes NP-hardness for robust BSTs and HTs, and provides both algorithmic guarantees and lower bounds: a BST with a competitive ratio of $ceil(log2(k+1))$ and a HT with regret $ceil(log2 k)$, along with near-optimal lower bounds, plus a Pareto-front framework for fairness under uniform scenarios. It also offers practical solutions, including a polynomial-time method to compute Pareto-optimal BSTs, MILP formulations for exact optimization, and extensive experiments validating the theoretical results. The work highlights fundamental differences between robust BSTs and HTs, introduces fairness as a multi-objective consideration in data structures, and lays a foundation for extending robust and fair design to other data-structural problems.
Abstract
We initiate the study of tree structures in the context of scenario-based robust optimization. Specifically, we study Binary Search Trees (BSTs) and Huffman coding, two fundamental techniques for efficiently managing and encoding data based on a known set of frequencies of keys. Given $k$ different scenarios, each defined by a distinct frequency distribution over the keys, our objective is to compute a single tree of best-possible performance, relative to any scenario. We consider, as performance metrics, the competitive ratio, which compares multiplicatively the cost of the solution to the tree of least cost among all scenarios, as well as the regret, which induces a similar, but additive comparison. For BSTs, we show that the problem is NP-hard across both metrics. We also show how to obtain a tree of competitive ratio $\lceil \log_2(k+1) \rceil$, and we prove that this ratio is optimal. For Huffman Trees, we show that the problem is, likewise, NP-hard across both metrics; we also give an algorithm of regret $\lceil \log_2 k \rceil$, which we show is near-optimal, by proving a lower bound of $\lfloor \log_2 k \rfloor$. Last, we give a polynomial-time algorithm for computing Pareto-optimal BSTs with respect to their regret, assuming scenarios defined by uniform distributions over the keys. This setting captures, in particular, the first study of fairness in the context of data structures. We provide an experimental evaluation of all algorithms. To this end, we also provide mixed integer linear program formulation for computing optimal trees.
