Muon with Spectral Guidance: Efficient Optimization for Scientific Machine Learning

Binghang Lu; Jiahao Zhang; Guang Lin

Muon with Spectral Guidance: Efficient Optimization for Scientific Machine Learning

Binghang Lu, Jiahao Zhang, Guang Lin

TL;DR

SpecMuon is proposed, a spectral-aware optimizer that integrates Muon's orthogonalized geometry with a mode-wise relaxed scalar auxiliary variable (RSAV) mechanism and adaptively regulates step sizes according to the global loss energy while preserving Muon's scale-balancing properties.

Abstract

Physics-informed neural networks and neural operators often suffer from severe optimization difficulties caused by ill-conditioned gradients, multi-scale spectral behavior, and stiffness induced by physical constraints. Recently, the Muon optimizer has shown promise by performing orthogonalized updates in the singular-vector basis of the gradient, thereby improving geometric conditioning. However, its unit-singular-value updates may lead to overly aggressive steps and lack explicit stability guarantees when applied to physics-informed learning. In this work, we propose SpecMuon, a spectral-aware optimizer that integrates Muon's orthogonalized geometry with a mode-wise relaxed scalar auxiliary variable (RSAV) mechanism. By decomposing matrix-valued gradients into singular modes and applying RSAV updates individually along dominant spectral directions, SpecMuon adaptively regulates step sizes according to the global loss energy while preserving Muon's scale-balancing properties. This formulation interprets optimization as a multi-mode gradient flow and enables principled control of stiff spectral components. We establish rigorous theoretical properties of SpecMuon, including a modified energy dissipation law, positivity and boundedness of auxiliary variables, and global convergence with a linear rate under the Polyak-Lojasiewicz condition. Numerical experiments on physics-informed neural networks, DeepONets, and fractional PINN-DeepONets demonstrate that SpecMuon achieves faster convergence and improved stability compared with Adam, AdamW, and the original Muon optimizer on benchmark problems such as the one-dimensional Burgers equation and fractional partial differential equations.

Muon with Spectral Guidance: Efficient Optimization for Scientific Machine Learning

TL;DR

Abstract

Paper Structure (14 sections, 6 theorems, 72 equations, 6 figures, 6 tables, 1 algorithm)

This paper contains 14 sections, 6 theorems, 72 equations, 6 figures, 6 tables, 1 algorithm.

Introduction
Background and Related Work
Muon
Scalar auxiliary variable (SAV) and the relaxed variant (RSAV)
Methodology
SpecMuon Algorithm
Computational efficiency
Analysis of the Algorithm
Numerical Results
Toy example: linear regression
Physics-Informed Neural Network
DeepONet
fPINN-DeepONet
Conclusion and Future Work

Key Result

Theorem 2.1

Let $(\Theta(t),r(t))$ satisfy eq:cts-sav. Then the modified energy $r(t)^2$ dissipates:

Figures (6)

Figure 1: Training loss (log scale) for the linear regression problem. SpecMuon (red) is compared with Muon, Adam, and AdamW. SpecMuon achieves faster convergence by combining spectral orthogonalization with mode-wise energy regulation.
Figure 2: PINN training on the one-dimensional Burgers’ equation.
Figure 3: Spatiotemporal solution $u(x, t)$ of the Burgers’ equation obtained by PINNs trained with Adam(left), Muon(middle), and SpecMuon(right). SpecMuon produces a more accurate and smoother approximation of the reference solution.
Figure 4: Spatial distribution of the $L_1$ error for the Burgers’ PINN trained with Adam(left), Muon(middle), and SpecMuon(right). SpecMuon exhibits reduced error magnitude.
Figure 5: DeepONet training on the Burgers’ equation.
...and 1 more figures

Theorems & Definitions (15)

Theorem 2.1: Modified energy law
proof
Theorem 2.2: Discrete energy dissipation for RSAV predictor and RSAV
proof
Theorem 3.1: Modified-energy dissipation (mode-wise and global)
proof
Theorem 3.2: Positivity and uniform lower bound for $r_i^{\,k}$
proof
Remark 3.3
Theorem 3.4: One–mode dissipation of the original energy for a sufficiently small step
...and 5 more

Muon with Spectral Guidance: Efficient Optimization for Scientific Machine Learning

TL;DR

Abstract

Muon with Spectral Guidance: Efficient Optimization for Scientific Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (15)