Hybrid and Unitary PEFT for Resource-Efficient Large Language Models
Haomin Qi, Zihan Dai, Chengbo Huang
TL;DR
The paper tackles the cost of fine-tuning large language models by evaluating PEFT techniques and introducing a hybrid per-layer strategy that dynamically fuses gradient-aligned LoRA-GA updates with orthogonal BOFT updates. It further applies transformer-adapted unitary RNN concepts to improve gradient stability during fine-tuning. Through extensive benchmarks (GLUE, GSM8K, MT-Bench, HumanEval) across 7B–405B models, the hybrid approach achieves near-full fine-tuning performance with roughly half the memory usage and about a 2.1x reduction in training time, with additional gains in multilingual and low-resource settings. The work demonstrates a practical, scalable path for resource-constrained LLM fine-tuning and provides a foundation for further integration with quantization and adaptive control strategies.
Abstract
Fine-tuning large language models (LLMs) remains a computational bottleneck due to their scale and memory demands. This paper presents a comprehensive evaluation of parameter-efficient fine-tuning (PEFT) techniques, including LoRA, BOFT, LoRA-GA, and uRNN, and introduces a novel hybrid strategy that dynamically integrates BOFT's orthogonal stability with LoRA-GA's gradient-aligned rapid convergence. By computing per-layer adaptive updates guided by gradient norms, the hybrid method achieves superior convergence efficiency and generalization across diverse tasks. We also explore, for the first time, the adaptation of unitary RNN (uRNN) principles to Transformer-based LLMs, enhancing gradient stability through structured unitary constraints. Across GLUE, GSM8K, MT-Bench, and HumanEval, using models ranging from 7B to 405B parameters, the hybrid approach yields consistent gains across three independent runs per task and model, approaching the quality of full fine-tuning while reducing training time by approximately 2.1 times and peak memory usage by nearly 50 percent, indicating practical significance under resource constraints. A compact multilingual and low-resource study on XNLI and FLORES, using 32 examples per language, further demonstrates consistent gains under the same budget with a small and stable footprint. These results indicate a practical and scalable path toward accessible LLM fine-tuning under resource constraints.
