Unifying Sign and Magnitude for Optimizing Deep Vision Networks via ThermoLion
Ahmed Nebli
TL;DR
The paper tackles the challenge of optimizing deep vision models under stochastic gradient noise by bridging magnitude- and sign-based methods. It introduces ThermoLion, a per-parameter, SNR-gated optimizer that switches between gas-like sign-based updates and solid-like magnitude-based updates, augmented by Momentum Alignment to accelerate confident descent. Across 12 vision datasets, ThermoLion delivers faster convergence and higher final accuracy than state-of-the-art baselines such as AdamW and Lion, with only minor additional computational cost. The work frames optimization as a thermodynamic process governed by local gradient reliability (SNR), offering a unifying axis that reframes Adam, Lion, and hybrids as different points along a continuum of update bitrate controlled by geometry.
Abstract
The training of deep vision models is fundamentally a signal recovery problem amidst high-dimensional stochastic noise. Current optimization paradigms impose a static compromise on information channel capacity. For instance, magnitude-based methods, such as AdamW, operate on the assumption that gradient norms are high-fidelity curvature signals. While this allows for precision in smooth regimes, it leads to catastrophic noise amplification when applied to rugged, non-convex landscapes. Conversely, sign-based methods (e.g., Lion) perform a radical 1-bit quantization of the gradient, which aims to provide robust regularization at the cost of discarding fine-grained descent information. We propose that optimal convergence requires neither static prior, but rather a dynamic modulation of the update bitrate. We introduce ThermoLion, a vision-centric framework that utilizes local Signal-to-Noise Ratio (SNR) gating to autonomously transition parameters between a "low-bit" exploration phase and a "high-precision" exploitation phase. Furthermore, we introduce a Momentum Alignment mechanism that detects constructive interference between historical drift and instantaneous gradients to accelerate convergence during stable trajectories. Empirical benchmarks across 12 diverse vision datasets (including CIFAR, SVHN, and GTSRB) demonstrate that ThermoLion surpasses state-of-the-art optimizers, such as AdamW and Lion, in convergence speed and terminal accuracy.
