Joint Localization and Activation Editing for Low-Resource Fine-Tuning
Wen Lai, Alexander Fraser, Ivan Titov
TL;DR
This work tackles the challenge of adapting large language models under low-resource data regimes. It introduces JoLA, a joint localization and activation editing method that dynamically identifies which attention heads to edit and whether to apply additive, multiplicative, or hybrid interventions, using HardConcrete gates to induce sparsity. Through extensive experiments on commonsense reasoning, natural language understanding, and natural language generation across LLaMA-3-8B and Qwen-2.5-7B, JoLA consistently outperforms strong baselines while updating far fewer parameters, demonstrating robust data efficiency and scalability. The key insight is that attention heads are the most impactful editing targets, and end-to-end gating yields stable performance across tasks and data regimes, with practical implications for efficient fine-tuning of large models. JoLA is released with code at the provided repository, enabling broader adoption of dynamic, data-efficient activation editing.
Abstract
Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, are commonly used to adapt LLMs. However, the effectiveness of standard PEFT methods is limited in low-resource scenarios with only a few hundred examples. Recent advances in interpretability research have inspired the emergence of activation editing (or steering) techniques, which modify the activations of specific model components. Due to their extremely small parameter counts, these methods show promise for small datasets. However, their performance is highly dependent on identifying the correct modules to edit and often lacks stability across different datasets. In this paper, we propose Joint Localization and Activation Editing (JoLA), a method that jointly learns (1) which heads in the Transformer to edit (2) whether the intervention should be additive, multiplicative, or both and (3) the intervention parameters themselves - the vectors applied as additive offsets or multiplicative scalings to the head output. Through evaluations on three benchmarks spanning commonsense reasoning, natural language understanding, and natural language generation, we demonstrate that JoLA consistently outperforms existing methods. The code for the method is released at https://github.com/wenlai-lavine/jola.
