Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms
Baran Hashemi, Kurt Pasque, Chris Teska, Ruriko Yoshida
TL;DR
The paper addresses the mismatch between Softmax attention and the polyhedral, DP-style reasoning central to combinatorial optimization. It introduces Tropical Attention, mapping queries/keys/values into tropical projective space and using the tropical Hilbert metric to perform max-plus style aggregation, yielding a polyhedral, 1-Lipschitz attention core. The authors prove that Multi-Head Tropical Attention (MHTA) universally approximates max-plus dynamic programs and realizes tropical transitive closure with polynomial resources, while empirically achieving strong out-of-distribution generalization, robustness to perturbations, and faster inference across NP-hard/complete problems. This work advances neural algorithmic reasoning by enabling sharper, more expressive Large Reasoning Models capable of tackling discrete optimization tasks across domains such as cryptography, phylogenetics, and physics.
Abstract
Can algebraic geometry enhance the sharpness, robustness, and interpretability of modern neural reasoning models by equipping them with a mathematically grounded inductive bias? To answer this, we introduce Tropical Attention, an attention mechanism grounded in tropical geometry that lifts the attention kernel into tropical projective space, where reasoning is piecewise-linear and 1-Lipschitz, thus preserving the polyhedral decision structure inherent to combinatorial reasoning. We prove that Multi-Head Tropical Attention (MHTA) stacks universally approximate tropical circuits and realize tropical transitive closure through composition, achieving polynomial resource bounds without invoking recurrent mechanisms. These guarantees explain why the induced polyhedral decision boundaries remain sharp and scale-invariant, rather than smoothed by Softmax. Empirically, we show that Tropical Attention delivers stronger out-of-distribution generalization in both length and value, with high robustness against perturbative noise, and substantially faster inference with fewer parameters compared to Softmax-based and recurrent attention baselines. For the first time, we extend neural algorithmic reasoning beyond PTIME problems to NP-hard and NP-complete problems, paving the way toward sharper and more expressive Large Reasoning Models (LRMs) capable of tackling complex combinatorial challenges in phylogenetics, cryptography, particle physics, and mathematical discovery.
