Table of Contents
Fetching ...

Softmax as a Lagrangian-Legendrian Seam

Christopher R. Lee-Jenkins

TL;DR

The paper builds a bridge between ML and differential geometry by showing that softmax serves as a seam between two gradient-generated descriptions on dual potentials $\phi$ and $\phi^*$. It embeds this seam into a folded contact-symplectic structure with a quadratic collar $\omega_q=dr\wedge\alpha + r^2 d\alpha$, producing a Legendrian boundary $\Gamma$ where $y=\nabla\phi^*(z)$ and $z=\nabla\phi(y)$. The equality case of Fenchel–Young, $\langle z,y\rangle=\phi(y)+\phi^*(z)$, gives the KL gap $\mathrm{KL}(y\|\text{softmax}(z))$, vanishing exactly on the seam; bias shifts correspond to Reeb-flow along the screen. The paper demonstrates concrete instances for $d=2$ and $d=3$, and discusses how this perspective connects to the replicator dynamics on the probability simplex and to information-geometric interpretations of ML calibration. It concludes with directions to compactify logit space (projective or spherical) to obtain global invariants and long-time dynamical control.

Abstract

This note offers a first bridge from machine learning to modern differential geometry. We show that the logits-to-probabilities step implemented by softmax can be modeled as a geometric interface: two potential-generated, conservative descriptions (from negative entropy and log-sum-exp) meet along a Legendrian "seam" on a contact screen (the probability simplex) inside a simple folded symplectic collar. Bias-shift invariance appears as Reeb flow on the screen, and the Fenchel-Young equality/KL gap provides a computable distance to the seam. We work out the two- and three-class cases to make the picture concrete and outline next steps for ML: compact logit models (projective or spherical), global invariants, and connections to information geometry where on-screen dynamics manifest as replicator flows.

Softmax as a Lagrangian-Legendrian Seam

TL;DR

The paper builds a bridge between ML and differential geometry by showing that softmax serves as a seam between two gradient-generated descriptions on dual potentials and . It embeds this seam into a folded contact-symplectic structure with a quadratic collar , producing a Legendrian boundary where and . The equality case of Fenchel–Young, , gives the KL gap , vanishing exactly on the seam; bias shifts correspond to Reeb-flow along the screen. The paper demonstrates concrete instances for and , and discusses how this perspective connects to the replicator dynamics on the probability simplex and to information-geometric interpretations of ML calibration. It concludes with directions to compactify logit space (projective or spherical) to obtain global invariants and long-time dynamical control.

Abstract

This note offers a first bridge from machine learning to modern differential geometry. We show that the logits-to-probabilities step implemented by softmax can be modeled as a geometric interface: two potential-generated, conservative descriptions (from negative entropy and log-sum-exp) meet along a Legendrian "seam" on a contact screen (the probability simplex) inside a simple folded symplectic collar. Bias-shift invariance appears as Reeb flow on the screen, and the Fenchel-Young equality/KL gap provides a computable distance to the seam. We work out the two- and three-class cases to make the picture concrete and outline next steps for ML: compact logit models (projective or spherical), global invariants, and connections to information geometry where on-screen dynamics manifest as replicator flows.

Paper Structure

This paper contains 4 sections, 26 equations, 2 figures.

Figures (2)

  • Figure 1: Two-class seam $p=\sigma(\Delta)$ (thick curve) on the $(\Delta,p)$-plane. The background shows the Fenchel--Young gap $\phi(y)+\phi^*(z)-\langle z,y\rangle=\mathrm{KL}\!\bigl(y\,\|\,\mathrm{softmax}(z)\bigr)$, which vanishes exactly on the seam.
  • Figure 2: Three-class softmax: image in the probability simplex $\Delta^2$ of a rectangular grid in centered-logits coordinates $(z_1\!-\!z_3,\;z_2\!-\!z_3)$. Only logit differences are visible on the screen, reflecting bias-shift (Reeb) invariance.