Table of Contents
Fetching ...

Preference-Driven Multi-Objective Combinatorial Optimization with Conditional Computation

Mingfeng Fan, Jianan Zhou, Yifeng Zhang, Yaoxin Wu, Jinbiao Chen, Guillaume Adrien Sartoretti

Abstract

Recent deep reinforcement learning methods have achieved remarkable success in solving multi-objective combinatorial optimization problems (MOCOPs) by decomposing them into multiple subproblems, each associated with a specific weight vector. However, these methods typically treat all subproblems equally and solve them using a single model, hindering the effective exploration of the solution space and thus leading to suboptimal performance. To overcome the limitation, we propose POCCO, a novel plug-and-play framework that enables adaptive selection of model structures for subproblems, which are subsequently optimized based on preference signals rather than explicit reward values. Specifically, we design a conditional computation block that routes subproblems to specialized neural architectures. Moreover, we propose a preference-driven optimization algorithm that learns pairwise preferences between winning and losing solutions. We evaluate the efficacy and versatility of POCCO by applying it to two state-of-the-art neural methods for MOCOPs. Experimental results across four classic MOCOP benchmarks demonstrate its significant superiority and strong generalization.

Preference-Driven Multi-Objective Combinatorial Optimization with Conditional Computation

Abstract

Recent deep reinforcement learning methods have achieved remarkable success in solving multi-objective combinatorial optimization problems (MOCOPs) by decomposing them into multiple subproblems, each associated with a specific weight vector. However, these methods typically treat all subproblems equally and solve them using a single model, hindering the effective exploration of the solution space and thus leading to suboptimal performance. To overcome the limitation, we propose POCCO, a novel plug-and-play framework that enables adaptive selection of model structures for subproblems, which are subsequently optimized based on preference signals rather than explicit reward values. Specifically, we design a conditional computation block that routes subproblems to specialized neural architectures. Moreover, we propose a preference-driven optimization algorithm that learns pairwise preferences between winning and losing solutions. We evaluate the efficacy and versatility of POCCO by applying it to two state-of-the-art neural methods for MOCOPs. Experimental results across four classic MOCOP benchmarks demonstrate its significant superiority and strong generalization.

Paper Structure

This paper contains 36 sections, 30 equations, 6 figures, 17 tables, 1 algorithm.

Figures (6)

  • Figure 1: Decoder structures of backbone and POCCO. Given an MOCOP instance $\mathcal{G}$ with $n+1$ nodes (e.g., $n$ customers and a depot, if applicable) and a weight vector $\lambda_i$, POCCO encodes their raw features into joint node embeddings $\{h_i\}_{i=0}^{n}$ using an encoder. At each decoding step $t$, the decoder forms a query $Q$ from the embeddings of the first and last selected nodes $(\pi_1,\pi_t)$, and computes the key $K$ and value $V$ via linear projections of $\{h_i\}_{i=0}^{n}$. The MHA layer processes $Q$, $K$, and $V$ to produce a context vector $h_c$, which is refined by the CCO block. The refined context is passed through a compatibility layer followed by a Softmax to compute the node selection probabilities. More details about the forward pass can be found in Appendix \ref{['app:b']}.
  • Figure 2: An overview of preference-driven MOCO. Unlike prior DRL methods that explicitly learn from scalarized rewards, our approach converts relative preferences into a BT likelihood, providing an implicit reward signal to optimize the PL policy.
  • Figure 3: Pareto fronts of benchmark instances.
  • Figure 4: Ablation study:(a) validates the effectiveness of PL; (b) and (c) verify the effects of different CCO block variants.
  • Figure 5: Effectis of the $\beta$.
  • ...and 1 more figures