Table of Contents
Fetching ...

Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models

Andy Zhou

TL;DR

The paper tackles cross-task interference when adapting large language models by introducing CS-ReFT, a representation-based fine-tuning framework that allocates separate orthonormal subspaces to individual tasks and uses a lightweight router to compose them on input. By freezing the base model and editing hidden states rather than weights, CS-ReFT achieves strong multi-task instruction-following with minimal parameter overhead. On AlpacaEval with Llama-2-7B, CS-ReFT achieves a 93.94% win rate using only 0.0098% of parameters, outperforming GPT-3.5 Turbo and several parameter-efficient baselines, demonstrating the effectiveness of specialized, composable representation edits. The work suggests that modular, hidden-state subspaces offer a scalable and interpretable path toward efficient, multi-skill adaptation of large language models, with future directions including subspace sharing/merging and deeper interpretability of the routing decisions.

Abstract

Adapting large language models to multiple tasks can cause cross-skill interference, where improvements for one skill degrade another. While methods such as LoRA impose orthogonality constraints at the weight level, they do not fully address interference in hidden-state representations. We propose Compositional Subspace Representation Fine-tuning (CS-ReFT), a novel representation-based approach that learns multiple orthonormal subspace transformations, each specializing in a distinct skill, and composes them via a lightweight router. By isolating these subspace edits in the hidden state, rather than weight matrices, CS-ReFT prevents cross-task conflicts more effectively. On the AlpacaEval benchmark, applying CS-ReFT to Llama-2-7B achieves a 93.94% win rate, surpassing GPT-3.5 Turbo (86.30%) while requiring only 0.0098% of model parameters. These findings show that specialized representation edits, composed via a simple router, significantly enhance multi-task instruction following with minimal overhead.

Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models

TL;DR

The paper tackles cross-task interference when adapting large language models by introducing CS-ReFT, a representation-based fine-tuning framework that allocates separate orthonormal subspaces to individual tasks and uses a lightweight router to compose them on input. By freezing the base model and editing hidden states rather than weights, CS-ReFT achieves strong multi-task instruction-following with minimal parameter overhead. On AlpacaEval with Llama-2-7B, CS-ReFT achieves a 93.94% win rate using only 0.0098% of parameters, outperforming GPT-3.5 Turbo and several parameter-efficient baselines, demonstrating the effectiveness of specialized, composable representation edits. The work suggests that modular, hidden-state subspaces offer a scalable and interpretable path toward efficient, multi-skill adaptation of large language models, with future directions including subspace sharing/merging and deeper interpretability of the routing decisions.

Abstract

Adapting large language models to multiple tasks can cause cross-skill interference, where improvements for one skill degrade another. While methods such as LoRA impose orthogonality constraints at the weight level, they do not fully address interference in hidden-state representations. We propose Compositional Subspace Representation Fine-tuning (CS-ReFT), a novel representation-based approach that learns multiple orthonormal subspace transformations, each specializing in a distinct skill, and composes them via a lightweight router. By isolating these subspace edits in the hidden state, rather than weight matrices, CS-ReFT prevents cross-task conflicts more effectively. On the AlpacaEval benchmark, applying CS-ReFT to Llama-2-7B achieves a 93.94% win rate, surpassing GPT-3.5 Turbo (86.30%) while requiring only 0.0098% of model parameters. These findings show that specialized representation edits, composed via a simple router, significantly enhance multi-task instruction following with minimal overhead.

Paper Structure

This paper contains 11 sections, 4 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: Illustration of CS-ReFT. (1) The left panel shows how Compositional Subspace Representation Fine-Tuning (CS-ReFT) applies specialized subspace transformations ($\Phi_1, \Phi_2, \Phi_3$) at specific positions in different layers to adapt a frozen model for multiple tasks. Each subspace edit is task-specific, reducing interference while allowing composition when needed. (2) The right panel details the routing mechanism: a lightweight router determines which subspaces to activate based on the input, ensuring efficient and targeted modifications.