PoLAR: Polar-Decomposed Low-Rank Adapter Representation
Kai Lion, Liang Zhang, Bingcong Li, Niao He
TL;DR
PoLAR tackles the underutilization of subspace in low-rank adapters used for fine-tuning large language models by introducing a polar-decomposed representation that enforces directional orthogonality via Stiefel-manifold factors. By parameterizing updates as $\Delta W = X \Theta Y^\top$ with $X,Y$ on the Stiefel manifold and $\Theta$ unconstrained, and optimizing with a landing-field approach, PoLAR achieves exponentially faster convergence on a canonical low-rank problem and yields consistent gains across models from 350M to 27B parameters on tasks spanning commonsense reasoning, mathematical problem solving, and natural language understanding. Empirical results show PoLAR increases the stable rank of updates, mitigates directional-diversity collapse observed in LoRA, and delivers improved accuracy versus LoRA/DoRA across benchmarks while offering practical runtime benefits on GPUs. The combination of architecture-optimizer co-design and infeasible manifold optimization enables scalable, parameter-efficient fine-tuning with tangible improvements in both performance and efficiency. Future work includes deeper spectral analysis of PoLAR dynamics and broader application beyond NLP.
Abstract
We show that low-rank adaptation of large-scale models suffers from a low stable rank that is well below the linear algebraic rank of the subspace, degrading fine-tuning performance. To mitigate the underutilization of the allocated subspace, we propose PoLAR, a parameterization inspired by the polar decomposition that factorizes the low-rank update into two direction matrices constrained to Stiefel manifolds and an unconstrained scale matrix. Our theory shows that PoLAR yields an exponentially faster convergence rate on a canonical low-rank adaptation problem. Pairing the parameterization with Riemannian optimization leads to consistent gains on three different benchmarks testing general language understanding, commonsense reasoning, and mathematical problem solving with base model sizes ranging from 350M to 27B.
