Table of Contents
Fetching ...

RoSA: Enhancing Parameter-Efficient Fine-Tuning via RoPE-aware Selective Adaptation in Large Language Models

Dayan Pan, Jingyuan Wang, Yilong Zhou, Jiawei Cheng, Pengyue Jia, Xiangyu Zhao

TL;DR

RoSA tackles the inefficiency of traditional PEFT by exploiting RoPE-induced low-frequency attention components and layer-wise heterogeneity. It introduces RoAE to selectively enhance RoPE low-frequency dimensions and DLS to dynamically update the most impactful layers, guided by LayerNorm gradient norms. Across fifteen benchmarks and multiple backbones, RoSA consistently outperforms mainstream PEFT methods under comparable trainable parameter budgets and scales effectively with model size. The framework is modular and broadly applicable to PEFT, offering a principled approach to frequency- and layer-aware fine-tuning with practical improvements in contextual understanding and efficiency.

Abstract

Fine-tuning large language models is essential for task-specific adaptation, yet it remains computationally prohibitive. Parameter-Efficient Fine-Tuning (PEFT) methods have emerged as a solution, but current approaches typically ignore the distinct roles of model components and the heterogeneous importance across layers, thereby limiting adaptation efficiency. Motivated by the observation that Rotary Position Embeddings (RoPE) induce critical activations in the low-frequency dimensions of attention states, we propose RoPE-aware Selective Adaptation (RoSA), a novel PEFT framework that allocates trainable parameters in a more targeted and effective manner. RoSA comprises a RoPE-aware Attention Enhancement (RoAE) module, which selectively enhances the low-frequency components of RoPE-influenced attention states, and a Dynamic Layer Selection (DLS) strategy that adaptively identifies and updates the most critical layers based on LayerNorm gradient norms. By combining dimension-wise enhancement with layer-wise adaptation, RoSA achieves more targeted and efficient fine-tuning. Extensive experiments on fifteen commonsense and arithmetic benchmarks demonstrate that RoSA outperforms existing mainstream PEFT methods under comparable trainable parameters. The code is available to ease reproducibility at https://github.com/Applied-Machine-Learning-Lab/RoSA.

RoSA: Enhancing Parameter-Efficient Fine-Tuning via RoPE-aware Selective Adaptation in Large Language Models

TL;DR

RoSA tackles the inefficiency of traditional PEFT by exploiting RoPE-induced low-frequency attention components and layer-wise heterogeneity. It introduces RoAE to selectively enhance RoPE low-frequency dimensions and DLS to dynamically update the most impactful layers, guided by LayerNorm gradient norms. Across fifteen benchmarks and multiple backbones, RoSA consistently outperforms mainstream PEFT methods under comparable trainable parameter budgets and scales effectively with model size. The framework is modular and broadly applicable to PEFT, offering a principled approach to frequency- and layer-aware fine-tuning with practical improvements in contextual understanding and efficiency.

Abstract

Fine-tuning large language models is essential for task-specific adaptation, yet it remains computationally prohibitive. Parameter-Efficient Fine-Tuning (PEFT) methods have emerged as a solution, but current approaches typically ignore the distinct roles of model components and the heterogeneous importance across layers, thereby limiting adaptation efficiency. Motivated by the observation that Rotary Position Embeddings (RoPE) induce critical activations in the low-frequency dimensions of attention states, we propose RoPE-aware Selective Adaptation (RoSA), a novel PEFT framework that allocates trainable parameters in a more targeted and effective manner. RoSA comprises a RoPE-aware Attention Enhancement (RoAE) module, which selectively enhances the low-frequency components of RoPE-influenced attention states, and a Dynamic Layer Selection (DLS) strategy that adaptively identifies and updates the most critical layers based on LayerNorm gradient norms. By combining dimension-wise enhancement with layer-wise adaptation, RoSA achieves more targeted and efficient fine-tuning. Extensive experiments on fifteen commonsense and arithmetic benchmarks demonstrate that RoSA outperforms existing mainstream PEFT methods under comparable trainable parameters. The code is available to ease reproducibility at https://github.com/Applied-Machine-Learning-Lab/RoSA.

Paper Structure

This paper contains 29 sections, 6 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Q-state activation strength visualizations in LLaMA-2-7B. We compute the average L2 norm per attention head to quantify activation strength. Stronger activations are concentrated in high-indexed (i.e., low-RoPE frequency) dimensions and vary across layers, highlighting both dimension-wise and layer-wise heterogeneity.
  • Figure 2: The architecture of RoSA. RoSA consists of two key modules: RoPE-aware Attention Enhancement (RoAE), which selectively enhances low-frequency components of RoPE-influenced Q/K states, and Dynamic Layer Selection (DLS), which dynamically selects important layers for update. Enabling targeted, efficient adaptation both frequency-wise and layer-wise.
  • Figure 3: Impact of layer selection ratio $k_{\text{ratio}}$.
  • Figure 4: Ablation results of RoSA on Commonsense task using Qwen2.5-7B.