HyperAdapt: Simple High-Rank Adaptation
Abel Gurung, Joseph Campbell
TL;DR
HyperAdapt presents a simple, parameter-efficient fine-tuning method that achieves high-rank updates by row- and column-wise diagonal scaling of a pre-trained weight matrix, requiring only $n+m$ trainable parameters per matrix. Theoretical rank bounds show the update can be effectively high-rank, and empirical results across GLUE, arithmetic and commonsense reasoning, and long-context reasoning demonstrate close to full fine-tuning performance with orders of magnitude fewer trainable parameters and no additional inference latency. The approach outperforms or matches strong PEFT baselines like LoRA, DoRA, and VeRA across multiple model sizes, while dramatically reducing memory and compute requirements. This makes high-rank adaptation practical for constrained compute/memory scenarios and scalable to large foundation models, with potential extensions to broader architectures and domains.
Abstract
Foundation models excel across diverse tasks, but adapting them to specialized applications often requires fine-tuning, an approach that is memory and compute-intensive. Parameter-efficient fine-tuning (PEFT) methods mitigate this by updating only a small subset of weights. In this paper, we introduce HyperAdapt, a parameter-efficient fine-tuning method that significantly reduces the number of trainable parameters compared to state-of-the-art methods like LoRA. Specifically, HyperAdapt adapts a pre-trained weight matrix by applying row- and column-wise scaling through diagonal matrices, thereby inducing a high-rank update while requiring only $n+m$ trainable parameters for an $n \times m$ matrix. Theoretically, we establish an upper bound on the rank of HyperAdapt's updates, and empirically, we confirm that it consistently induces high-rank transformations across model layers. Experiments on GLUE, arithmetic reasoning, and commonsense reasoning benchmarks with models up to 14B parameters demonstrate that HyperAdapt matches or nearly matches the performance of full fine-tuning and state-of-the-art PEFT methods while using orders of magnitude fewer trainable parameters.
