Table of Contents
Fetching ...

Joint Localization and Activation Editing for Low-Resource Fine-Tuning

Wen Lai, Alexander Fraser, Ivan Titov

TL;DR

This work tackles the challenge of adapting large language models under low-resource data regimes. It introduces JoLA, a joint localization and activation editing method that dynamically identifies which attention heads to edit and whether to apply additive, multiplicative, or hybrid interventions, using HardConcrete gates to induce sparsity. Through extensive experiments on commonsense reasoning, natural language understanding, and natural language generation across LLaMA-3-8B and Qwen-2.5-7B, JoLA consistently outperforms strong baselines while updating far fewer parameters, demonstrating robust data efficiency and scalability. The key insight is that attention heads are the most impactful editing targets, and end-to-end gating yields stable performance across tasks and data regimes, with practical implications for efficient fine-tuning of large models. JoLA is released with code at the provided repository, enabling broader adoption of dynamic, data-efficient activation editing.

Abstract

Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, are commonly used to adapt LLMs. However, the effectiveness of standard PEFT methods is limited in low-resource scenarios with only a few hundred examples. Recent advances in interpretability research have inspired the emergence of activation editing (or steering) techniques, which modify the activations of specific model components. Due to their extremely small parameter counts, these methods show promise for small datasets. However, their performance is highly dependent on identifying the correct modules to edit and often lacks stability across different datasets. In this paper, we propose Joint Localization and Activation Editing (JoLA), a method that jointly learns (1) which heads in the Transformer to edit (2) whether the intervention should be additive, multiplicative, or both and (3) the intervention parameters themselves - the vectors applied as additive offsets or multiplicative scalings to the head output. Through evaluations on three benchmarks spanning commonsense reasoning, natural language understanding, and natural language generation, we demonstrate that JoLA consistently outperforms existing methods. The code for the method is released at https://github.com/wenlai-lavine/jola.

Joint Localization and Activation Editing for Low-Resource Fine-Tuning

TL;DR

This work tackles the challenge of adapting large language models under low-resource data regimes. It introduces JoLA, a joint localization and activation editing method that dynamically identifies which attention heads to edit and whether to apply additive, multiplicative, or hybrid interventions, using HardConcrete gates to induce sparsity. Through extensive experiments on commonsense reasoning, natural language understanding, and natural language generation across LLaMA-3-8B and Qwen-2.5-7B, JoLA consistently outperforms strong baselines while updating far fewer parameters, demonstrating robust data efficiency and scalability. The key insight is that attention heads are the most impactful editing targets, and end-to-end gating yields stable performance across tasks and data regimes, with practical implications for efficient fine-tuning of large models. JoLA is released with code at the provided repository, enabling broader adoption of dynamic, data-efficient activation editing.

Abstract

Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, are commonly used to adapt LLMs. However, the effectiveness of standard PEFT methods is limited in low-resource scenarios with only a few hundred examples. Recent advances in interpretability research have inspired the emergence of activation editing (or steering) techniques, which modify the activations of specific model components. Due to their extremely small parameter counts, these methods show promise for small datasets. However, their performance is highly dependent on identifying the correct modules to edit and often lacks stability across different datasets. In this paper, we propose Joint Localization and Activation Editing (JoLA), a method that jointly learns (1) which heads in the Transformer to edit (2) whether the intervention should be additive, multiplicative, or both and (3) the intervention parameters themselves - the vectors applied as additive offsets or multiplicative scalings to the head output. Through evaluations on three benchmarks spanning commonsense reasoning, natural language understanding, and natural language generation, we demonstrate that JoLA consistently outperforms existing methods. The code for the method is released at https://github.com/wenlai-lavine/jola.

Paper Structure

This paper contains 46 sections, 7 equations, 11 figures, 14 tables.

Figures (11)

  • Figure 1: Comparison of previous representative activation editing methods with proposed JoLA. (a) includes BitFIT ben-zaken-etal-2022-bitfit, which fine-tunes only the bias term; RED wu-etal-2024-advancing introduces scaling and bias vectors in the MLP layer; ReFT wu2024reft, which fine-tunes the hidden layer representations; and LoFIT yin2024lofit intervenes with attention heads in two steps. (b) JoLA introduces a gating mechanism that dynamically selects and locates attention heads to modify the activation outputs. We compare activation changes ($z^{(l,i)^{\prime}}$) across modules under two interventions (additive $a^{(l,i)}$ and multiplicative $m^{(l,i)}$), relative to the initial activation value ($z^{(l,i)}$).
  • Figure 2: Performance comparison of activation editing across different Transformer modules: bias terms, MLP layers, hidden states, and attention heads.
  • Figure 3: Comparison of the performance impact of scaling factors versus bias offsets in activation editing.
  • Figure 4: Performance comparison of JoLA and baseline methods across commonsense reasoning, natural language understanding, and natural language generation tasks for LLaMA-3 dubey2024llama and Qwen-2.5 yang2024qwen2.
  • Figure 5: Ablation 2: Performance comparison of models with separate gating units for scaling and offset vectors versus a shared gating unit.
  • ...and 6 more figures