Table of Contents
Fetching ...

Differentiable Extensions with Rounding Guarantees for Combinatorial Optimization over Permutations

Robert R. Nerem, Zhishang Luo, Akbar Rafiey, Yusu Wang

TL;DR

This work introduces Birkhoff Extension (BE), a differentiable, almost-everywhere differentiable extension of any real-valued function on permutations to the Birkhoff polytope of doubly stochastic matrices. BE achieves this via a continuous Birkhoff decomposition that yields $F(A)=\sum_{\ell} \alpha_\ell f(P_\ell)$ with a fixed order (or score-induced order) over permutation matrices, ensuring continuity, differentiability, and a rounding guarantee: $f(\mathrm{round}_S(A))\le F(A)$. The framework supports gradient-based optimization (via Frank-Wolfe on $\mathcal{D}_n$) and offers strategies to escape local minima through dynamic score updates; it also extends to optimization over trees. Experiments on QAP, TSP, and DFASP indicate BE can outperform certain MILP baselines and provide competitive local improvements, particularly for QAP, while enabling unsupervised neural optimization by using $F$ as a differentiable loss. The approach promises scalable, differentiable optimization for permutation-based combinatorial problems and opens avenues for GPU-accelerated matching and tree-structured extensions.

Abstract

Continuously extending combinatorial optimization objectives is a powerful technique commonly applied to the optimization of set functions. However, few such methods exist for extending functions on permutations, despite the fact that many combinatorial optimization problems, such as the quadratic assignment problem (QAP) and the traveling salesperson problem (TSP), are inherently optimization over permutations. We present Birkhoff Extension (BE), an almost-everywhere-differentiable continuous polytime-computable extension of any real-valued function on permutations to doubly stochastic matrices. Key to this construction is our introduction of a continuous variant of the well-known Birkhoff decomposition. Our extension has several nice properties making it appealing for optimization problems. First, BE provides a rounding guarantee, namely any solution to the extension can be efficiently rounded to a permutation without increasing the function value. Furthermore, an approximate solution in the relaxed case will give rise to an approximate solution in the space of permutations. Second, using BE, any real-valued optimization objective on permutations can be extended to an almost-everywhere-differentiable objective function over the space of doubly stochastic matrices. This makes our BE amenable to not only gradient-descent based optimization, but also unsupervised neural combinatorial optimization where training often requires a differentiable loss. Third, based on the above properties, we present a simple optimization procedure which can be readily combined with existing optimization approaches to offer local improvements (i.e., the quality of the final solution is no worse than the initial solution). Finally, we also adapt our extension to optimization problems over a class of trees, such as Steiner tree and optimization-based hierarchical clustering.

Differentiable Extensions with Rounding Guarantees for Combinatorial Optimization over Permutations

TL;DR

This work introduces Birkhoff Extension (BE), a differentiable, almost-everywhere differentiable extension of any real-valued function on permutations to the Birkhoff polytope of doubly stochastic matrices. BE achieves this via a continuous Birkhoff decomposition that yields with a fixed order (or score-induced order) over permutation matrices, ensuring continuity, differentiability, and a rounding guarantee: . The framework supports gradient-based optimization (via Frank-Wolfe on ) and offers strategies to escape local minima through dynamic score updates; it also extends to optimization over trees. Experiments on QAP, TSP, and DFASP indicate BE can outperform certain MILP baselines and provide competitive local improvements, particularly for QAP, while enabling unsupervised neural optimization by using as a differentiable loss. The approach promises scalable, differentiable optimization for permutation-based combinatorial problems and opens avenues for GPU-accelerated matching and tree-structured extensions.

Abstract

Continuously extending combinatorial optimization objectives is a powerful technique commonly applied to the optimization of set functions. However, few such methods exist for extending functions on permutations, despite the fact that many combinatorial optimization problems, such as the quadratic assignment problem (QAP) and the traveling salesperson problem (TSP), are inherently optimization over permutations. We present Birkhoff Extension (BE), an almost-everywhere-differentiable continuous polytime-computable extension of any real-valued function on permutations to doubly stochastic matrices. Key to this construction is our introduction of a continuous variant of the well-known Birkhoff decomposition. Our extension has several nice properties making it appealing for optimization problems. First, BE provides a rounding guarantee, namely any solution to the extension can be efficiently rounded to a permutation without increasing the function value. Furthermore, an approximate solution in the relaxed case will give rise to an approximate solution in the space of permutations. Second, using BE, any real-valued optimization objective on permutations can be extended to an almost-everywhere-differentiable objective function over the space of doubly stochastic matrices. This makes our BE amenable to not only gradient-descent based optimization, but also unsupervised neural combinatorial optimization where training often requires a differentiable loss. Third, based on the above properties, we present a simple optimization procedure which can be readily combined with existing optimization approaches to offer local improvements (i.e., the quality of the final solution is no worse than the initial solution). Finally, we also adapt our extension to optimization problems over a class of trees, such as Steiner tree and optimization-based hierarchical clustering.

Paper Structure

This paper contains 37 sections, 11 theorems, 35 equations, 5 figures, 7 tables, 7 algorithms.

Key Result

Theorem 2.1

Any doubly stochastic matrix $A \in \mathcal{D}_n$, can be decomposed as $A = \sum_{k = 1}^{M}\alpha_kP_k$ where $M < n^2-n+1$, $\alpha_k > 0$, $\sum_k\alpha_k = 1$, and $P_k \in \mathcal{P}_n$.

Figures (5)

  • Figure 1: The pipeline of training a neural network $N_\theta$ for a single instance. For a given problem instance $I$, we have its representation $X_I$ and a score matrix $S_I$ as input to the neural network $N_\theta$. The output of the neural network is a doubly stochastic matrix $A = N_\theta(X_I, S_I)$. Birkhoff extensions are used to compute the loss $F_S(A) = \sum_{k=1}^M \alpha_k(A) f(P_k(A))$ and we minimize it via backpropagation. In the figure above, for example, $M = 3$ and rounding produces the permutation $P_2 = \mathrm{round}_{S_I}(A_I)$, highlighted in red.
  • Figure 2: Gap vs. truncation size
  • Figure 3: Effect of score-update frequency
  • Figure : Classical Birkhoff decomposition Birkhoff
  • Figure : Static score Frank-Wolfe over $\mathcal{D}_n$

Theorems & Definitions (37)

  • Theorem 2.1: Birkhoff decomposition Birkhoff
  • Definition 2.2
  • Definition 2.3: Continuous Birkhoff Decomposition
  • Theorem 2.4
  • Theorem 2.5
  • Definition 2.6: Score-Induced Birkhoff Decompositions
  • Theorem 2.7
  • Claim 2.8
  • Definition 2.9
  • Definition 2.10
  • ...and 27 more