Divide and Learn: Multi-Objective Combinatorial Optimization at Scale
Esha Singh, Dongxia Wu, Chien-Yi Yang, Tajana Rosing, Rose Yu, Yi-An Ma
TL;DR
This paper reframes multi-objective combinatorial optimization (MOCO) as an online learning problem with full-bandit feedback and develops Divide & Learn (D&L), a decomposed, multi-expert framework. It partitions the decision space into overlapping subproblems of size $d$, coordinates cross-subproblem conflicts via a Lagrangian relaxation with dual variables, and uses a trio of no-regret experts (UCB, EXP3, FTRL) to construct solutions online; a zeroth-order local search refines subproblem solutions between expert phases. The authors prove a regret bound of $O(d\sqrt{T\log T})$ that scales with subproblem size rather than the combinatorial space, and show a sublinear coordination cost thanks to diminishing overlap; they also demonstrate strong empirical performance on MOCO benchmarks and a real hardware-software co-design task, achieving near-specialized solvers with two-to-three orders of magnitude greater efficiency. The results establish a principled, scalable alternative to surrogate modeling or offline training for MOCO, particularly as problem size and objective count grow. Overall, D&L provides robust, online, domain-agnostic MOCO with theoretical guarantees and practical efficiency gains in large-scale, expensive-evaluation settings.
Abstract
Multi-objective combinatorial optimization seeks Pareto-optimal solutions over exponentially large discrete spaces, yet existing methods sacrifice generality, scalability, or theoretical guarantees. We reformulate it as an online learning problem over a decomposed decision space, solving position-wise bandit subproblems via adaptive expert-guided sequential construction. This formulation admits regret bounds of $O(d\sqrt{T \log T})$ depending on subproblem dimensionality \(d\) rather than combinatorial space size. On standard benchmarks, our method achieves 80--98\% of specialized solvers performance while achieving two to three orders of magnitude improvement in sample and computational efficiency over Bayesian optimization methods. On real-world hardware-software co-design for AI accelerators with expensive simulations, we outperform competing methods under fixed evaluation budgets. The advantage grows with problem scale and objective count, establishing bandit optimization over decomposed decision spaces as a principled alternative to surrogate modeling or offline training for multi-objective optimization.
