Table of Contents
Fetching ...

Generalized Neural Sorting Networks with Error-Free Differentiable Swap Functions

Jungtaek Kim, Jeongbeen Yoon, Minsu Cho

TL;DR

This paper defines a softening error by a differentiable swap function, and develops an error-free swap function that holds a non-decreasing condition and differentiability and a permutation-equivariant Transformer network with multi-head attention with multi-head attention is adopted.

Abstract

Sorting is a fundamental operation of all computer systems, having been a long-standing significant research topic. Beyond the problem formulation of traditional sorting algorithms, we consider sorting problems for more abstract yet expressive inputs, e.g., multi-digit images and image fragments, through a neural sorting network. To learn a mapping from a high-dimensional input to an ordinal variable, the differentiability of sorting networks needs to be guaranteed. In this paper we define a softening error by a differentiable swap function, and develop an error-free swap function that holds a non-decreasing condition and differentiability. Furthermore, a permutation-equivariant Transformer network with multi-head attention is adopted to capture dependency between given inputs and also leverage its model capacity with self-attention. Experiments on diverse sorting benchmarks show that our methods perform better than or comparable to baseline methods.

Generalized Neural Sorting Networks with Error-Free Differentiable Swap Functions

TL;DR

This paper defines a softening error by a differentiable swap function, and develops an error-free swap function that holds a non-decreasing condition and differentiability and a permutation-equivariant Transformer network with multi-head attention with multi-head attention is adopted.

Abstract

Sorting is a fundamental operation of all computer systems, having been a long-standing significant research topic. Beyond the problem formulation of traditional sorting algorithms, we consider sorting problems for more abstract yet expressive inputs, e.g., multi-digit images and image fragments, through a neural sorting network. To learn a mapping from a high-dimensional input to an ordinal variable, the differentiability of sorting networks needs to be guaranteed. In this paper we define a softening error by a differentiable swap function, and develop an error-free swap function that holds a non-decreasing condition and differentiability. Furthermore, a permutation-equivariant Transformer network with multi-head attention is adopted to capture dependency between given inputs and also leverage its model capacity with self-attention. Experiments on diverse sorting benchmarks show that our methods perform better than or comparable to baseline methods.
Paper Structure (37 sections, 4 theorems, 21 equations, 8 figures, 21 tables)

This paper contains 37 sections, 4 theorems, 21 equations, 8 figures, 21 tables.

Key Result

Proposition 1

A permutation matrix $\mathbf{P} \in \mathbb{R}^{n \times n}$ is doubly-stochastic, which implies that $\sum_{i = 1}^n[\mathbf{P}]_{ij} = 1$ and $\sum_{j = 1}^n[\mathbf{P}]_{ij} = 1$. In particular, regardless of the definition of a swap function with $\min$, $\max$, $\overline{\mathrm{min}}$, and $

Figures (8)

  • Figure 1: A sorting network with 5 wire sets and their permutation matrices.
  • Figure 2: Comparisons of diverse DSFs where a swap function is applied once. After a single operation, two input values $x$ and $y$ are softened while our error-free DSF does not change two values. If $|x - y|$ is small, softening will be more significant.
  • Figure 3: Illustration of our neural sorting network with error-free DSFs. Given high-dimensional inputs $\mathbf{X}$, a permutation-equivariant network produces a vector of ordinal variables $\mathbf{s}$, which is used to be swapped using a soft or hard sorting network.
  • Figure 4: An optimal monotonic sigmoid function, which is presented in \ref{['eqn:optimal']}.
  • Figure 5: Comparisons of diverse DSFs in terms of the numbers of swap functions applied. Our error-free DSF does not change the original $x$ and $y$, unlike other DSFs. We initially set $x = 4, y = 0$ for the left panel or $x = 8, y = 0$ for the right panel, where $k = \#\textrm{Swaps}$.
  • ...and 3 more figures

Theorems & Definitions (12)

  • Proposition 1: Modification of Lemma 3 in the work PetersenF2022iclr
  • proof
  • Definition 1
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • proof
  • proof
  • proof
  • ...and 2 more