Divide and Learn: Multi-Objective Combinatorial Optimization at Scale

Esha Singh; Dongxia Wu; Chien-Yi Yang; Tajana Rosing; Rose Yu; Yi-An Ma

Divide and Learn: Multi-Objective Combinatorial Optimization at Scale

Esha Singh, Dongxia Wu, Chien-Yi Yang, Tajana Rosing, Rose Yu, Yi-An Ma

TL;DR

This paper reframes multi-objective combinatorial optimization (MOCO) as an online learning problem with full-bandit feedback and develops Divide & Learn (D&L), a decomposed, multi-expert framework. It partitions the decision space into overlapping subproblems of size $d$, coordinates cross-subproblem conflicts via a Lagrangian relaxation with dual variables, and uses a trio of no-regret experts (UCB, EXP3, FTRL) to construct solutions online; a zeroth-order local search refines subproblem solutions between expert phases. The authors prove a regret bound of $O(d\sqrt{T\log T})$ that scales with subproblem size rather than the combinatorial space, and show a sublinear coordination cost thanks to diminishing overlap; they also demonstrate strong empirical performance on MOCO benchmarks and a real hardware-software co-design task, achieving near-specialized solvers with two-to-three orders of magnitude greater efficiency. The results establish a principled, scalable alternative to surrogate modeling or offline training for MOCO, particularly as problem size and objective count grow. Overall, D&L provides robust, online, domain-agnostic MOCO with theoretical guarantees and practical efficiency gains in large-scale, expensive-evaluation settings.

Abstract

Multi-objective combinatorial optimization seeks Pareto-optimal solutions over exponentially large discrete spaces, yet existing methods sacrifice generality, scalability, or theoretical guarantees. We reformulate it as an online learning problem over a decomposed decision space, solving position-wise bandit subproblems via adaptive expert-guided sequential construction. This formulation admits regret bounds of $O(d\sqrt{T \log T})$ depending on subproblem dimensionality $d$ rather than combinatorial space size. On standard benchmarks, our method achieves 80--98\% of specialized solvers performance while achieving two to three orders of magnitude improvement in sample and computational efficiency over Bayesian optimization methods. On real-world hardware-software co-design for AI accelerators with expensive simulations, we outperform competing methods under fixed evaluation budgets. The advantage grows with problem scale and objective count, establishing bandit optimization over decomposed decision spaces as a principled alternative to surrogate modeling or offline training for multi-objective optimization.

Divide and Learn: Multi-Objective Combinatorial Optimization at Scale

TL;DR

, coordinates cross-subproblem conflicts via a Lagrangian relaxation with dual variables, and uses a trio of no-regret experts (UCB, EXP3, FTRL) to construct solutions online; a zeroth-order local search refines subproblem solutions between expert phases. The authors prove a regret bound of

that scales with subproblem size rather than the combinatorial space, and show a sublinear coordination cost thanks to diminishing overlap; they also demonstrate strong empirical performance on MOCO benchmarks and a real hardware-software co-design task, achieving near-specialized solvers with two-to-three orders of magnitude greater efficiency. The results establish a principled, scalable alternative to surrogate modeling or offline training for MOCO, particularly as problem size and objective count grow. Overall, D&L provides robust, online, domain-agnostic MOCO with theoretical guarantees and practical efficiency gains in large-scale, expensive-evaluation settings.

Abstract

depending on subproblem dimensionality

rather than combinatorial space size. On standard benchmarks, our method achieves 80--98\% of specialized solvers performance while achieving two to three orders of magnitude improvement in sample and computational efficiency over Bayesian optimization methods. On real-world hardware-software co-design for AI accelerators with expensive simulations, we outperform competing methods under fixed evaluation budgets. The advantage grows with problem scale and objective count, establishing bandit optimization over decomposed decision spaces as a principled alternative to surrogate modeling or offline training for multi-objective optimization.

Paper Structure (170 sections, 29 theorems, 249 equations, 20 figures, 12 tables, 5 algorithms)

This paper contains 170 sections, 29 theorems, 249 equations, 20 figures, 12 tables, 5 algorithms.

Introduction
Contributions
Related Work
Decomposition-based MOCO.
Neural Combinatorial Optimization.
Combinatorial Bandits.
Online learning for black-box MOCO
Preliminaries
The D&L Algorithm
Online Learning Formulation
Algorithm Overview
Decomposition Strategy
Coordinating Overlapping Subproblems via Lagrangian Relaxation
Multi-Expert Learning for Subproblems
Expert Selection and Coordination
...and 155 more sections

Key Result

Proposition 4.4

The metric-based decomposition (Definition def:metric-decomp-main) satisfies Assumption A.3. Let $\rho := \max_i |\{k : i \in S_k\}|$ denote the maximum overlap multiplicity. For any modification on subproblem $S_k$: where $\Delta_s:= \max_{x,x' \in {\mathcal{X}}^k} \delta(x,x')$ denotes the diameter of a subproblem of size $s$ under the problem-specific metric, and $\rho$ is the maximum overlap

Figures (20)

Figure 1: Overview of D&L. (i) Any multi-objective problem is scalarized & decomposed into overlapping subproblems with shared variables (dashed nodes in (b)). Bandit-based action selection optimizes each subproblem independently; Lagrangian duality enforces cross-subproblem consistency (ii) Online Combinatorial Multi-expert learning: For each position in a subproblem, an expert (this work: UCB,FTRL,EXP3 or TS,EXP3, see Section \ref{['sec:multi-expert']}) is sampled via the mixture distribution $\pi_t$ to propose an action. All experts update shared position-action statistics, amortizing exploration across exponential solution spaces. Lagrangian multipliers coordinate overlaps.
Figure 2: FLOP model validation across methods. Bars show the Mean Absolute Error (MAE) between predicted FLOP shares and measured wall-clock time shares per component. Lower values indicate better agreement. All methods achieve MAE $<$ 7%, with the majority under 2%, confirming that our analytical FLOP estimates reliably predict relative computational costs. ours_ucb corresponds to our method D&L.
Figure 3: Decomposition size and overlap ablation on BiTSP-20. We vary decomposition size $d \in \{5, 10, 15, 20\}$ (25%--100% of problem size) and overlap percentage from 0% to 100%. (Left) Hypervolume improves with overlap across all decomposition sizes, with $d \in \{10, 15\}$ approaching the WS-LKH baseline (red dashed) by 40--50% overlap. (Center) Runtime remains stable below 60% overlap; beyond this threshold, smaller decompositions (especially $d=10$) exhibit runtime explosion due to subproblem proliferation. (Right) Tour length decreases with overlap, with all configurations converging to comparable quality ($\sim$13) by 50% overlap. The smallest decomposition ($d=5$) shows the highest sensitivity to overlap, exhibiting both the largest quality gains and highest variance. Results averaged over 50 runs; shaded regions denote $\pm 1$ standard deviation.
Figure 4: Decomposition size and overlap ablation on BiTSP-100. We vary decomposition size $d \in \{10, 15, 25, 35, 40, 50, 70, 90, 100\}$ and overlap percentage from 0% to 100%. (Left) Hypervolume increases with overlap up to 40--50%, then plateaus; moderate decomposition sizes ($d \in \{25, 50\}$) achieve the best quality. (Center) Runtime remains stable below 50% overlap but increases superlinearly beyond, particularly for smaller $d$ due to the proliferation of overlapping subproblems. (Right) Average tour length improves with overlap, converging across all configurations by 50%. The red dashed line indicates the weighted-sum LKH baseline. Results averaged over 50 runs; shaded regions show $\pm 1$ standard deviation.
Figure 5: Harm detection analysis on BiTSP-20.(Left) Computational redundancy measures total subproblem evaluations divided by problem size. Small decompositions suffer disproportionately: $d=5$ reaches $63\times$ redundancy at 100% overlap, while $d=20$ remains below $10\times$ throughout. The harm threshold ($10\times$, red dashed) is crossed at $\sim$50% overlap for $d=5$ but never exceeded for $d \geq 15$. (Right) Marginal efficiency ($\Delta\text{HV}/\Delta\text{Runtime}$) quantifies diminishing returns. All configurations show positive efficiency below 50% overlap (gray dashed), indicating quality gains justify runtime costs. Beyond this threshold, efficiency drops by 1--2 orders of magnitude, with several configurations exhibiting near-zero or negative marginal returns. This identifies 50% overlap as the critical efficiency cliff beyond which additional overlap provides negligible benefit at substantial computational cost.
...and 15 more figures

Theorems & Definitions (64)

Definition 4.1: Assumptions
Definition 4.2: Metric-Based Decomposition
Remark 4.3: Domain Agnosticity
Proposition 4.4: Bounded Coupling
Theorem 4.5: Coupling Error Bound
Definition 5.1: Decomposed Regret
Theorem 5.2: Regret Decomposition
Corollary 5.3: Explicit Regret Bound
Theorem 12.1: Regret Decomposition Under Structured Decomposition and Overlap
Theorem 12.2: Total Average-Regret
...and 54 more

Divide and Learn: Multi-Objective Combinatorial Optimization at Scale

TL;DR

Abstract

Divide and Learn: Multi-Objective Combinatorial Optimization at Scale

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (20)

Theorems & Definitions (64)