Table of Contents
Fetching ...

Unifying Sign and Magnitude for Optimizing Deep Vision Networks via ThermoLion

Ahmed Nebli

TL;DR

The paper tackles the challenge of optimizing deep vision models under stochastic gradient noise by bridging magnitude- and sign-based methods. It introduces ThermoLion, a per-parameter, SNR-gated optimizer that switches between gas-like sign-based updates and solid-like magnitude-based updates, augmented by Momentum Alignment to accelerate confident descent. Across 12 vision datasets, ThermoLion delivers faster convergence and higher final accuracy than state-of-the-art baselines such as AdamW and Lion, with only minor additional computational cost. The work frames optimization as a thermodynamic process governed by local gradient reliability (SNR), offering a unifying axis that reframes Adam, Lion, and hybrids as different points along a continuum of update bitrate controlled by geometry.

Abstract

The training of deep vision models is fundamentally a signal recovery problem amidst high-dimensional stochastic noise. Current optimization paradigms impose a static compromise on information channel capacity. For instance, magnitude-based methods, such as AdamW, operate on the assumption that gradient norms are high-fidelity curvature signals. While this allows for precision in smooth regimes, it leads to catastrophic noise amplification when applied to rugged, non-convex landscapes. Conversely, sign-based methods (e.g., Lion) perform a radical 1-bit quantization of the gradient, which aims to provide robust regularization at the cost of discarding fine-grained descent information. We propose that optimal convergence requires neither static prior, but rather a dynamic modulation of the update bitrate. We introduce ThermoLion, a vision-centric framework that utilizes local Signal-to-Noise Ratio (SNR) gating to autonomously transition parameters between a "low-bit" exploration phase and a "high-precision" exploitation phase. Furthermore, we introduce a Momentum Alignment mechanism that detects constructive interference between historical drift and instantaneous gradients to accelerate convergence during stable trajectories. Empirical benchmarks across 12 diverse vision datasets (including CIFAR, SVHN, and GTSRB) demonstrate that ThermoLion surpasses state-of-the-art optimizers, such as AdamW and Lion, in convergence speed and terminal accuracy.

Unifying Sign and Magnitude for Optimizing Deep Vision Networks via ThermoLion

TL;DR

The paper tackles the challenge of optimizing deep vision models under stochastic gradient noise by bridging magnitude- and sign-based methods. It introduces ThermoLion, a per-parameter, SNR-gated optimizer that switches between gas-like sign-based updates and solid-like magnitude-based updates, augmented by Momentum Alignment to accelerate confident descent. Across 12 vision datasets, ThermoLion delivers faster convergence and higher final accuracy than state-of-the-art baselines such as AdamW and Lion, with only minor additional computational cost. The work frames optimization as a thermodynamic process governed by local gradient reliability (SNR), offering a unifying axis that reframes Adam, Lion, and hybrids as different points along a continuum of update bitrate controlled by geometry.

Abstract

The training of deep vision models is fundamentally a signal recovery problem amidst high-dimensional stochastic noise. Current optimization paradigms impose a static compromise on information channel capacity. For instance, magnitude-based methods, such as AdamW, operate on the assumption that gradient norms are high-fidelity curvature signals. While this allows for precision in smooth regimes, it leads to catastrophic noise amplification when applied to rugged, non-convex landscapes. Conversely, sign-based methods (e.g., Lion) perform a radical 1-bit quantization of the gradient, which aims to provide robust regularization at the cost of discarding fine-grained descent information. We propose that optimal convergence requires neither static prior, but rather a dynamic modulation of the update bitrate. We introduce ThermoLion, a vision-centric framework that utilizes local Signal-to-Noise Ratio (SNR) gating to autonomously transition parameters between a "low-bit" exploration phase and a "high-precision" exploitation phase. Furthermore, we introduce a Momentum Alignment mechanism that detects constructive interference between historical drift and instantaneous gradients to accelerate convergence during stable trajectories. Empirical benchmarks across 12 diverse vision datasets (including CIFAR, SVHN, and GTSRB) demonstrate that ThermoLion surpasses state-of-the-art optimizers, such as AdamW and Lion, in convergence speed and terminal accuracy.

Paper Structure

This paper contains 21 sections, 6 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Training Loss Convergence Dynamics. Evolution of the Cross-Entropy loss (log scale) over the 12-epoch budget across all benchmarks. The brown curve represents ThermoLion. In low-entropy regimes (e.g., MNIST, USPS), the optimizer matches the rapid descent of curvature-aware baselines. In high-entropy regimes (e.g., GTSRB, CIFAR-100, SVHN), ThermoLion maintains a steeper descent trajectory where purely magnitude-based methods (Adam) and purely sign-based methods (Lion) plateau. All runs use the common ConvNet architecture, batch size, and hyperparameter settings.