Table of Contents
Fetching ...

Muon with Spectral Guidance: Efficient Optimization for Scientific Machine Learning

Binghang Lu, Jiahao Zhang, Guang Lin

TL;DR

SpecMuon is proposed, a spectral-aware optimizer that integrates Muon's orthogonalized geometry with a mode-wise relaxed scalar auxiliary variable (RSAV) mechanism and adaptively regulates step sizes according to the global loss energy while preserving Muon's scale-balancing properties.

Abstract

Physics-informed neural networks and neural operators often suffer from severe optimization difficulties caused by ill-conditioned gradients, multi-scale spectral behavior, and stiffness induced by physical constraints. Recently, the Muon optimizer has shown promise by performing orthogonalized updates in the singular-vector basis of the gradient, thereby improving geometric conditioning. However, its unit-singular-value updates may lead to overly aggressive steps and lack explicit stability guarantees when applied to physics-informed learning. In this work, we propose SpecMuon, a spectral-aware optimizer that integrates Muon's orthogonalized geometry with a mode-wise relaxed scalar auxiliary variable (RSAV) mechanism. By decomposing matrix-valued gradients into singular modes and applying RSAV updates individually along dominant spectral directions, SpecMuon adaptively regulates step sizes according to the global loss energy while preserving Muon's scale-balancing properties. This formulation interprets optimization as a multi-mode gradient flow and enables principled control of stiff spectral components. We establish rigorous theoretical properties of SpecMuon, including a modified energy dissipation law, positivity and boundedness of auxiliary variables, and global convergence with a linear rate under the Polyak-Lojasiewicz condition. Numerical experiments on physics-informed neural networks, DeepONets, and fractional PINN-DeepONets demonstrate that SpecMuon achieves faster convergence and improved stability compared with Adam, AdamW, and the original Muon optimizer on benchmark problems such as the one-dimensional Burgers equation and fractional partial differential equations.

Muon with Spectral Guidance: Efficient Optimization for Scientific Machine Learning

TL;DR

SpecMuon is proposed, a spectral-aware optimizer that integrates Muon's orthogonalized geometry with a mode-wise relaxed scalar auxiliary variable (RSAV) mechanism and adaptively regulates step sizes according to the global loss energy while preserving Muon's scale-balancing properties.

Abstract

Physics-informed neural networks and neural operators often suffer from severe optimization difficulties caused by ill-conditioned gradients, multi-scale spectral behavior, and stiffness induced by physical constraints. Recently, the Muon optimizer has shown promise by performing orthogonalized updates in the singular-vector basis of the gradient, thereby improving geometric conditioning. However, its unit-singular-value updates may lead to overly aggressive steps and lack explicit stability guarantees when applied to physics-informed learning. In this work, we propose SpecMuon, a spectral-aware optimizer that integrates Muon's orthogonalized geometry with a mode-wise relaxed scalar auxiliary variable (RSAV) mechanism. By decomposing matrix-valued gradients into singular modes and applying RSAV updates individually along dominant spectral directions, SpecMuon adaptively regulates step sizes according to the global loss energy while preserving Muon's scale-balancing properties. This formulation interprets optimization as a multi-mode gradient flow and enables principled control of stiff spectral components. We establish rigorous theoretical properties of SpecMuon, including a modified energy dissipation law, positivity and boundedness of auxiliary variables, and global convergence with a linear rate under the Polyak-Lojasiewicz condition. Numerical experiments on physics-informed neural networks, DeepONets, and fractional PINN-DeepONets demonstrate that SpecMuon achieves faster convergence and improved stability compared with Adam, AdamW, and the original Muon optimizer on benchmark problems such as the one-dimensional Burgers equation and fractional partial differential equations.
Paper Structure (14 sections, 6 theorems, 72 equations, 6 figures, 6 tables, 1 algorithm)

This paper contains 14 sections, 6 theorems, 72 equations, 6 figures, 6 tables, 1 algorithm.

Key Result

Theorem 2.1

Let $(\Theta(t),r(t))$ satisfy eq:cts-sav. Then the modified energy $r(t)^2$ dissipates:

Figures (6)

  • Figure 1: Training loss (log scale) for the linear regression problem. SpecMuon (red) is compared with Muon, Adam, and AdamW. SpecMuon achieves faster convergence by combining spectral orthogonalization with mode-wise energy regulation.
  • Figure 2: PINN training on the one-dimensional Burgers’ equation.
  • Figure 3: Spatiotemporal solution $u(x, t)$ of the Burgers’ equation obtained by PINNs trained with Adam(left), Muon(middle), and SpecMuon(right). SpecMuon produces a more accurate and smoother approximation of the reference solution.
  • Figure 4: Spatial distribution of the $L_1$ error for the Burgers’ PINN trained with Adam(left), Muon(middle), and SpecMuon(right). SpecMuon exhibits reduced error magnitude.
  • Figure 5: DeepONet training on the Burgers’ equation.
  • ...and 1 more figures

Theorems & Definitions (15)

  • Theorem 2.1: Modified energy law
  • proof
  • Theorem 2.2: Discrete energy dissipation for RSAV predictor and RSAV
  • proof
  • Theorem 3.1: Modified-energy dissipation (mode-wise and global)
  • proof
  • Theorem 3.2: Positivity and uniform lower bound for $r_i^{\,k}$
  • proof
  • Remark 3.3
  • Theorem 3.4: One–mode dissipation of the original energy for a sufficiently small step
  • ...and 5 more