Learning Solution-Aware Transformers for Efficiently Solving Quadratic Assignment Problem
Zhentao Tan, Yadong Mu
TL;DR
This paper tackles Koopmans-Beckmann's Quadratic Assignment Problem (QAP), a strongly NP-hard COP, by introducing SAWT, a learn-to-improve reinforcement learning method that encodes facilities and locations separately and incorporates solution-aware attention to capture higher-order QAP structure. The model uses independent embeddings for facilities and locations, a Solution Aware Transformer encoder, and a swap-based decoder guided by REINFORCE with a value head, achieving scalable performance without building an association graph. Through extensive experiments on self-generated and QAPLIB benchmarks, SAWT demonstrates strong generalization across problem sizes (up to $n=100$) and favorable inference efficiency compared to exact solvers, with notable gaps reduced as steps increase. The work shows promising practical impact for large-scale QAPs, while acknowledging limitations on certain QAPLIB categories and signaling future directions in meta-learning to further boost generalization.
Abstract
Recently various optimization problems, such as Mixed Integer Linear Programming Problems (MILPs), have undergone comprehensive investigation, leveraging the capabilities of machine learning. This work focuses on learning-based solutions for efficiently solving the Quadratic Assignment Problem (QAPs), which stands as a formidable challenge in combinatorial optimization. While many instances of simpler problems admit fully polynomial-time approximate solution (FPTAS), QAP is shown to be strongly NP-hard. Even finding a FPTAS for QAP is difficult, in the sense that the existence of a FPTAS implies $P = NP$. Current research on QAPs suffer from limited scale and computational inefficiency. To attack the aforementioned issues, we here propose the first solution of its kind for QAP in the learn-to-improve category. This work encodes facility and location nodes separately, instead of forming computationally intensive association graphs prevalent in current approaches. This design choice enables scalability to larger problem sizes. Furthermore, a \textbf{S}olution \textbf{AW}are \textbf{T}ransformer (SAWT) architecture integrates the incumbent solution matrix with the attention score to effectively capture higher-order information of the QAPs. Our model's effectiveness is validated through extensive experiments on self-generated QAP instances of varying sizes and the QAPLIB benchmark.
