Softmax as a Lagrangian-Legendrian Seam
Christopher R. Lee-Jenkins
TL;DR
The paper builds a bridge between ML and differential geometry by showing that softmax serves as a seam between two gradient-generated descriptions on dual potentials $\phi$ and $\phi^*$. It embeds this seam into a folded contact-symplectic structure with a quadratic collar $\omega_q=dr\wedge\alpha + r^2 d\alpha$, producing a Legendrian boundary $\Gamma$ where $y=\nabla\phi^*(z)$ and $z=\nabla\phi(y)$. The equality case of Fenchel–Young, $\langle z,y\rangle=\phi(y)+\phi^*(z)$, gives the KL gap $\mathrm{KL}(y\|\text{softmax}(z))$, vanishing exactly on the seam; bias shifts correspond to Reeb-flow along the screen. The paper demonstrates concrete instances for $d=2$ and $d=3$, and discusses how this perspective connects to the replicator dynamics on the probability simplex and to information-geometric interpretations of ML calibration. It concludes with directions to compactify logit space (projective or spherical) to obtain global invariants and long-time dynamical control.
Abstract
This note offers a first bridge from machine learning to modern differential geometry. We show that the logits-to-probabilities step implemented by softmax can be modeled as a geometric interface: two potential-generated, conservative descriptions (from negative entropy and log-sum-exp) meet along a Legendrian "seam" on a contact screen (the probability simplex) inside a simple folded symplectic collar. Bias-shift invariance appears as Reeb flow on the screen, and the Fenchel-Young equality/KL gap provides a computable distance to the seam. We work out the two- and three-class cases to make the picture concrete and outline next steps for ML: compact logit models (projective or spherical), global invariants, and connections to information geometry where on-screen dynamics manifest as replicator flows.
