ReFT: Representation Finetuning for Language Models

Zhengxuan Wu; Aryaman Arora; Zheng Wang; Atticus Geiger; Dan Jurafsky; Christopher D. Manning; Christopher Potts

ReFT: Representation Finetuning for Language Models

Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts

TL;DR

This work introduces Representation Finetuning (ReFT), a framework that optimizes task-specific interventions on frozen model representations rather than updating weights. The authors instantiate LoReFT, a low-rank subspace edit, and an efficiency-focused DiReFT ablation, demonstrating superior parameter efficiency (15x–65x fewer trainable parameters than LoRA) with competitive or state-of-the-art performance across commonsense, arithmetic, instruction-following, and GLUE benchmarks. By grounding ReFT in interventional interpretability and causal-abstraction ideas, the paper shows how learned representation edits can guide model behavior with improved efficiency and interpretability. The release of a reusable ReFT training library further enables researchers to explore this paradigm across large language models and tasks.

Abstract

Parameter-efficient finetuning (PEFT) methods seek to adapt large neural models via updates to a small number of weights. However, much prior interpretability work has shown that representations encode rich semantic information, suggesting that editing representations might be a more powerful alternative. We pursue this hypothesis by developing a family of Representation Finetuning (ReFT) methods. ReFT methods operate on a frozen base model and learn task-specific interventions on hidden representations. We define a strong instance of the ReFT family, Low-rank Linear Subspace ReFT (LoReFT), and we identify an ablation of this method that trades some performance for increased efficiency. Both are drop-in replacements for existing PEFTs and learn interventions that are 15x--65x more parameter-efficient than LoRA. We showcase LoReFT on eight commonsense reasoning tasks, four arithmetic reasoning tasks, instruction-tuning, and GLUE. In all these evaluations, our ReFTs deliver the best balance of efficiency and performance, and almost always outperform state-of-the-art PEFTs. We release a generic ReFT training library publicly at https://github.com/stanfordnlp/pyreft.

ReFT: Representation Finetuning for Language Models

TL;DR

Abstract

Paper Structure (73 sections, 14 equations, 11 figures, 17 tables)

This paper contains 73 sections, 14 equations, 11 figures, 17 tables.

Introduction
Related work
Parameter-efficient finetuning methods (PEFTs).
Representation editing.
Interventional interpretability.
ReFT
Motivation
Two low-rank ReFT instantiations
LoReFT.
DiReFT.
Training objective.
The ReFT family of methods
Experiments
Hyperparameter configuration
Commonsense reasoning
...and 58 more sections

Figures (11)

Figure 1: Parameter count vs. performance for LoReFT and other PEFTs across four benchmarks when applied to LLaMA, Llama-2, Llama-3, and RoBERTa models. Despite training far fewer parameters than existing PEFTs, LoReFT achieves competitive or even state-of-the-art performance on all tasks. Its value is most apparent for the largest models in our evaluations. Note: FT is full-parameter finetuning, which is not a PEFT or ReFT method. Additional results are in \ref{['sec:experiments']}.
Figure 2: Illustration of ReFT. (1) The left panel depicts an intervention $I$: the intervention function $\Phi$ is applied to hidden representations at positions $P$ in layer $l$. (2) The right panel depicts the intervention function used in LoReFT, which finds an edit vector that only modifies the representation in the linear subspace spanned by the rows of $\mathbf{R}$. Specifically, we show how a rank-2 LoReFT operates on 3-dimensional hidden representations.
Figure 3: Memorisation test results for LLaMA-1 7B model on recovering first n-th tokens of the Alice's Adventures in Wonderland by rank-1 LoReFT intervention on various layers of the last token's residual stream. Rec. % is measured by the percentage of prefix matches.
Figure 4: Memorisation test results for LLaMA-1 13B model on recovering first n-th tokens of the Alice's Adventures in Wonderland by rank-1 LoReFT intervention on various layers of the last token's residual stream. Rec. % is measured by the percentage of prefix matches.
Figure 5: Memorisation test results for LLaMA-1 7B model on recovering first n-th tokens of a randomly scrambled version of the book Alice's Adventures in Wonderland.
...and 6 more figures

Theorems & Definitions (2)

Definition 3.1
Definition 3.2

ReFT: Representation Finetuning for Language Models

TL;DR

Abstract

ReFT: Representation Finetuning for Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (11)

Theorems & Definitions (2)