Table of Contents
Fetching ...

Linear Chain Transformation: Expanding Optimization Dynamics for Fine-Tuning Large Language Models

Yulong Wang, Chang Zuo, Yin Xuan, Hong Li, Ni Wei

TL;DR

This paper proposes Linear Chain Transformation (LinChain), a novel approach that introduces a sequence of linear transformations during fine-tuning to enrich optimization dynamics and significantly improves the performance of LLM fine-tuning over state-of-the-art methods.

Abstract

Fine-tuning large language models (LLMs) has become essential for adapting pretrained models to specific downstream tasks. In this paper, we propose Linear Chain Transformation (LinChain), a novel approach that introduces a sequence of linear transformations during fine-tuning to enrich optimization dynamics. By incorporating multiple linear transformations into the parameter update process, LinChain expands the effective rank of updates and enhances the model's ability to learn complex task-specific representations. We demonstrate that this method significantly improves the performance of LLM fine-tuning over state-of-the-art methods by providing more flexible optimization paths during training, while maintaining the inference efficiency of the resulting model. Our experiments on various benchmark tasks show that LinChain leads to better generalization, fewer learnable parameters, and improved task adaptation, making it a compelling strategy for LLM fine-tuning.

Linear Chain Transformation: Expanding Optimization Dynamics for Fine-Tuning Large Language Models

TL;DR

This paper proposes Linear Chain Transformation (LinChain), a novel approach that introduces a sequence of linear transformations during fine-tuning to enrich optimization dynamics and significantly improves the performance of LLM fine-tuning over state-of-the-art methods.

Abstract

Fine-tuning large language models (LLMs) has become essential for adapting pretrained models to specific downstream tasks. In this paper, we propose Linear Chain Transformation (LinChain), a novel approach that introduces a sequence of linear transformations during fine-tuning to enrich optimization dynamics. By incorporating multiple linear transformations into the parameter update process, LinChain expands the effective rank of updates and enhances the model's ability to learn complex task-specific representations. We demonstrate that this method significantly improves the performance of LLM fine-tuning over state-of-the-art methods by providing more flexible optimization paths during training, while maintaining the inference efficiency of the resulting model. Our experiments on various benchmark tasks show that LinChain leads to better generalization, fewer learnable parameters, and improved task adaptation, making it a compelling strategy for LLM fine-tuning.

Paper Structure

This paper contains 25 sections, 12 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Reparametrization in LinChain: $W_i$ ($i=1,2,3$) are trained alongside matrices $A$ and $B$.
  • Figure 2: Training loss curves for LinChain, LoRA, and MoSLoRA on the Commonsense170K dataset. LinChain demonstrates faster convergence and achieves a lower final loss. All methods employ matrices $A$ and $B$ with a rank of 16. In LinChain, three additional $16 \times 16$ matrices are inserted between matrices $A$ and $B$.