LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters

Vladimir Bogachev; Vladimir Aletov; Alexander Molozhavenko; Denis Bobkov; Vera Soboleva; Aibek Alanov; Maxim Rakhuba

LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters

Vladimir Bogachev, Vladimir Aletov, Alexander Molozhavenko, Denis Bobkov, Vera Soboleva, Aibek Alanov, Maxim Rakhuba

TL;DR

This work tackles the reparameterization sensitivity of LoRA by formulating LoRA optimization on the fixed-rank manifold $\mathcal{M}_r$ and introducing a MuON-inspired optimizer on that manifold, named Riemion. It combines a Riemannian gradient-based optimization, Locally Optimal Initialization (LOI) to place the initial point advantageously on $\mathcal{M}_r$, and an efficient, autodiff-friendly implementation (including OrthoLR/ProjectLR) to keep overhead minimal. Key contributions include the generalization of Muon to $\mathcal{M}_r$, a principled LOI with a closed-form-like initialization (e.g., $\Delta W^{(0)}_* = \alpha U_{1,r} V_{r,2r}^\top$ under suitable choices), and a single-backward-pass gradient trick enabling scalable computation. Empirically, Riemion delivers faster convergence and improved final task performance over standard LoRA and recent geometrically aware methods on both large language models and diffusion-based generation tasks, with reduced variance and competitive overhead.

Abstract

This work presents a novel, fully Riemannian framework for Low-Rank Adaptation (LoRA) that geometrically treats low-rank adapters by optimizing them directly on the fixed-rank manifold. This formulation eliminates the parametrization ambiguity present in standard Euclidean optimizers. Our framework integrates three key components to achieve this: (1) we derive Riemannion, a new Riemannian optimizer on the fixed-rank matrix manifold that generalizes the recently proposed Muon optimizer; (2) we develop a Riemannian gradient-informed LoRA initialization, and (3) we provide an efficient implementation without prominent overhead that uses automatic differentiation to compute arising geometric operations while adhering to best practices in numerical linear algebra. Comprehensive experimental results on both LLM and diffusion model architectures demonstrate that our approach yields consistent and noticeable improvements in convergence speed and final task performance over both standard LoRA and its state-of-the-art modifications.

LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters

TL;DR

This work tackles the reparameterization sensitivity of LoRA by formulating LoRA optimization on the fixed-rank manifold

and introducing a MuON-inspired optimizer on that manifold, named Riemion. It combines a Riemannian gradient-based optimization, Locally Optimal Initialization (LOI) to place the initial point advantageously on

, and an efficient, autodiff-friendly implementation (including OrthoLR/ProjectLR) to keep overhead minimal. Key contributions include the generalization of Muon to

, a principled LOI with a closed-form-like initialization (e.g.,

under suitable choices), and a single-backward-pass gradient trick enabling scalable computation. Empirically, Riemion delivers faster convergence and improved final task performance over standard LoRA and recent geometrically aware methods on both large language models and diffusion-based generation tasks, with reduced variance and competitive overhead.

LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters

TL;DR

Abstract

LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (10)