GateRA: Token-Aware Modulation for Parameter-Efficient Fine-Tuning
Jie Ou, Shuaihong Jiang, Yingjun Du, Cees G. M. Snoek
TL;DR
GateRA introduces token-aware modulation to parameter-efficient fine-tuning by gating per-token low-rank updates, enabling selective, input-dependent adaptation. A lightweight gating module predicts g(x) to scale the HiRA-style updates, while an entropy-based regularizer encourages near-binary gating for interpretability and sparsity. Theoretical analysis reveals a soft gradient masking effect, preserving pre-trained knowledge on easy tokens and focusing updates on harder ones. Empirical results across commonsense reasoning, open-domain dialogue, and mathematical reasoning show GateRA consistently matches or surpasses prior PEFT methods with minimal inference overhead, highlighting improved efficiency and interpretability in autoregressive generation. These findings suggest token-level modulation can significantly enhance the plasticity-stability balance in large-model fine-tuning while offering clearer insights into when updates occur.
Abstract
Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, DoRA, and HiRA, enable lightweight adaptation of large pre-trained models via low-rank updates. However, existing PEFT approaches apply static, input-agnostic updates to all tokens, disregarding the varying importance and difficulty of different inputs. This uniform treatment can lead to overfitting on trivial content or under-adaptation on more informative regions, especially in autoregressive settings with distinct prefill and decoding dynamics. In this paper, we propose GateRA, a unified framework that introduces token-aware modulation to dynamically adjust the strength of PEFT updates. By incorporating adaptive gating into standard PEFT branches, GateRA enables selective, token-level adaptation, preserving pre-trained knowledge for well-modeled inputs while focusing capacity on challenging cases. Empirical visualizations reveal phase-sensitive behaviors, where GateRA automatically suppresses updates for redundant prefill tokens while emphasizing adaptation during decoding. To promote confident and efficient modulation, we further introduce an entropy-based regularization that encourages near-binary gating decisions. This regularization prevents diffuse update patterns and leads to interpretable, sparse adaptation without hard thresholding. Finally, we present a theoretical analysis showing that GateRA induces a soft gradient-masking effect over the PEFT path, enabling continuous and differentiable control over adaptation. Experiments on multiple commonsense reasoning benchmarks demonstrate that GateRA consistently outperforms or matches prior PEFT methods.
