Table of Contents
Fetching ...

AlphaGrad: Non-Linear Gradient Normalization Optimizer

Soham Sane

TL;DR

AlphaGrad introduces a memory-efficient optimizer that applies layer-wise L2 gradient normalization $\tilde{g}_t^L = g_t^L / (\|g_t^L\|_2 + \epsilon)$ followed by a smooth non-linear transformation $g_t^{\prime L} = \tanh(\alpha^L \cdot \tilde{g}_t^L)$, yielding bounded updates and reduced per-parameter state. The authors provide both convex and non-convex convergence analyses, showing dependence on problem dimension $n$ and the alignment factor $\gamma_{min}$ through $\gamma_t = \tanh(\alpha) \frac{\|g_t\|_2}{\|g_t\|_2 + \epsilon}$, with rates like $O\left(\frac{\sqrt{n}}{\gamma_{\min}\sqrt{T}}\right)$ in the convex case and $O\left( \sqrt{\frac{L n (f(x_1) - f_{inf})}{\tanh^2(\alpha) T}} \right)$ in the non-convex setting. Empirically, AlphaGrad shows highly context-dependent performance across RL tasks: instability in off-policy DQN, improved stability with TD3 when $\alpha$ is tuned, and substantially superior performance in on-policy PPO, underscoring the critical role of empirical $\alpha$ selection. The work highlights AlphaGrad as a compelling alternative for memory-constrained training, particularly in on-policy regimes, while also outlining strong avenues for further validation, adaptive scheduling, and broader benchmarking.

Abstract

We introduce AlphaGrad, a memory-efficient, conditionally stateless optimizer addressing the memory overhead and hyperparameter complexity of adaptive methods like Adam. AlphaGrad enforces scale invariance via tensor-wise L2 gradient normalization followed by a smooth hyperbolic tangent transformation, $g' = \tanh(α\cdot \tilde{g})$, controlled by a single steepness parameter $α$. Our contributions include: (1) the AlphaGrad algorithm formulation; (2) a formal non-convex convergence analysis guaranteeing stationarity; (3) extensive empirical evaluation on diverse RL benchmarks (DQN, TD3, PPO). Compared to Adam, AlphaGrad demonstrates a highly context-dependent performance profile. While exhibiting instability in off-policy DQN, it provides enhanced training stability with competitive results in TD3 (requiring careful $α$ tuning) and achieves substantially superior performance in on-policy PPO. These results underscore the critical importance of empirical $α$ selection, revealing strong interactions between the optimizer's dynamics and the underlying RL algorithm. AlphaGrad presents a compelling alternative optimizer for memory-constrained scenarios and shows significant promise for on-policy learning regimes where its stability and efficiency advantages can be particularly impactful.

AlphaGrad: Non-Linear Gradient Normalization Optimizer

TL;DR

AlphaGrad introduces a memory-efficient optimizer that applies layer-wise L2 gradient normalization followed by a smooth non-linear transformation , yielding bounded updates and reduced per-parameter state. The authors provide both convex and non-convex convergence analyses, showing dependence on problem dimension and the alignment factor through , with rates like in the convex case and in the non-convex setting. Empirically, AlphaGrad shows highly context-dependent performance across RL tasks: instability in off-policy DQN, improved stability with TD3 when is tuned, and substantially superior performance in on-policy PPO, underscoring the critical role of empirical selection. The work highlights AlphaGrad as a compelling alternative for memory-constrained training, particularly in on-policy regimes, while also outlining strong avenues for further validation, adaptive scheduling, and broader benchmarking.

Abstract

We introduce AlphaGrad, a memory-efficient, conditionally stateless optimizer addressing the memory overhead and hyperparameter complexity of adaptive methods like Adam. AlphaGrad enforces scale invariance via tensor-wise L2 gradient normalization followed by a smooth hyperbolic tangent transformation, , controlled by a single steepness parameter . Our contributions include: (1) the AlphaGrad algorithm formulation; (2) a formal non-convex convergence analysis guaranteeing stationarity; (3) extensive empirical evaluation on diverse RL benchmarks (DQN, TD3, PPO). Compared to Adam, AlphaGrad demonstrates a highly context-dependent performance profile. While exhibiting instability in off-policy DQN, it provides enhanced training stability with competitive results in TD3 (requiring careful tuning) and achieves substantially superior performance in on-policy PPO. These results underscore the critical importance of empirical selection, revealing strong interactions between the optimizer's dynamics and the underlying RL algorithm. AlphaGrad presents a compelling alternative optimizer for memory-constrained scenarios and shows significant promise for on-policy learning regimes where its stability and efficiency advantages can be particularly impactful.

Paper Structure

This paper contains 38 sections, 4 theorems, 27 equations, 3 figures, 1 table.

Key Result

Lemma 1

The squared L2 norm of the AlphaGrad update direction $g'_t$ is bounded by the dimension $n$: $\|g'_t\|_2^2 \le n$. Consequently, $\|g'_t\|_2 \le \sqrt{n}$.

Figures (3)

  • Figure 1: Performance on CartPole-v1 (DQN): Episodic Return, Length, TD Loss, and Q-Values. AlphaGrad variants ($\alpha=93, 186, 279$) vs. Adam.
  • Figure 2: Performance on Hopper-v4 (TD3): Training Rewards, Episode Length, Q-Values, and Actor Loss. AlphaGrad variants (tuned $\alpha$) vs. Adam.
  • Figure 3: Performance on HalfCheetah-v5 (PPO): Training Rewards, Explained Variance, Entropy Loss, and Approx KL Divergence. AlphaGrad variants ($\alpha=98, 196, 294$) vs. Adam.

Theorems & Definitions (8)

  • Lemma 1: Update Norm Bound
  • proof
  • Lemma 2: Inner Product Alignment with $\epsilon$ > 0
  • proof
  • Theorem 1: Average Iterate Convergence - Convex
  • proof
  • Theorem 2: Convergence to Stationarity
  • proof