Permutation Learning with Only N Parameters: From SoftSort to Self-Organizing Gaussians
Kai Uwe Barthel, Florian Barthel, Peter Eisert
TL;DR
This work tackles the memory bottlenecks of permutation learning by introducing ShuffleSoftSort, a differentiable method that learns permutations with only $N$ parameters, unlike Gumbel-Sinkhorn's $O(N^2)$ memory. By iteratively shuffling indices and applying SoftSort for $R$ steps under a temperature schedule $\tau$, the approach preserves previous ordering while enabling more flexible, multidimensional sorting. The method incorporates a row-wise computation and a loss combining neighborhood, stochastic, and standard-deviation terms to converge to a valid permutation, achieving high-quality results with substantially reduced memory. This enables scalable permutation learning for large-scale tasks such as grid-based image sorting and self-organizing Gaussian representations in 3D scene reconstruction, with practical storage reductions and end-to-end differentiability.
Abstract
Sorting and permutation learning are key concepts in optimization and machine learning, especially when organizing high-dimensional data into meaningful spatial layouts. The Gumbel-Sinkhorn method, while effective, requires N*N parameters to determine a full permutation matrix, making it computationally expensive for large datasets. Low-rank matrix factorization approximations reduce memory requirements to 2NM (with M << N), but they still struggle with very large problems. SoftSort, by providing a continuous relaxation of the argsort operator, allows differentiable 1D sorting, but it faces challenges with multidimensional data and complex permutations. In this paper, we present a novel method for learning permutations using only N parameters, which dramatically reduces storage costs. Our method extends SoftSort by iteratively shuffling the N indices of the elements and applying a few SoftSort optimization steps per iteration. This modification significantly improves sorting quality, especially for multidimensional data and complex optimization criteria, and outperforms pure SoftSort. Our method offers improved memory efficiency and scalability compared to existing approaches, while maintaining high-quality permutation learning. Its dramatically reduced memory requirements make it particularly well-suited for large-scale optimization tasks, such as "Self-Organizing Gaussians", where efficient and scalable permutation learning is critical.
