Table of Contents
Fetching ...

Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation

Shen Yuan, Haotian Liu, Hongteng Xu

TL;DR

This study proposes a simple but effective adaptation method based on Householder reflections, which achieves superior performance with fewer learnable parameters when adapting large language models and conditional image generators.

Abstract

While following different technical routes, both low-rank and orthogonal adaptation techniques can efficiently adapt large-scale pre-training models in specific tasks or domains based on a small piece of trainable parameters. In this study, we bridge the gap between these two techniques, proposing a simple but effective adaptation method based on Householder reflections. Given a pre-trained model, our method fine-tunes its layers by multiplying each frozen weight matrix with an orthogonal matrix constructed by a chain of learnable Householder reflections (HRs). This HR-based orthogonal fine-tuning is equivalent to an adaptive low-rank adaptation. Moreover, we show that the orthogonality of the reflection planes corresponding to the HRs impacts the model capacity and regularity. The analysis motivates us to regularize the orthogonality of the HRs, leading to different implementations of the proposed Householder reflection adaptation (HRA) method. Compared with state-of-the-art methods, HRA achieves superior performance with fewer learnable parameters when adapting large language models and conditional image generators. The code of the experiments is available at \url{https://github.com/DaShenZi721/HRA}, and the method has been merged into the \href{https://github.com/huggingface/peft}{PEFT} package.

Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation

TL;DR

This study proposes a simple but effective adaptation method based on Householder reflections, which achieves superior performance with fewer learnable parameters when adapting large language models and conditional image generators.

Abstract

While following different technical routes, both low-rank and orthogonal adaptation techniques can efficiently adapt large-scale pre-training models in specific tasks or domains based on a small piece of trainable parameters. In this study, we bridge the gap between these two techniques, proposing a simple but effective adaptation method based on Householder reflections. Given a pre-trained model, our method fine-tunes its layers by multiplying each frozen weight matrix with an orthogonal matrix constructed by a chain of learnable Householder reflections (HRs). This HR-based orthogonal fine-tuning is equivalent to an adaptive low-rank adaptation. Moreover, we show that the orthogonality of the reflection planes corresponding to the HRs impacts the model capacity and regularity. The analysis motivates us to regularize the orthogonality of the HRs, leading to different implementations of the proposed Householder reflection adaptation (HRA) method. Compared with state-of-the-art methods, HRA achieves superior performance with fewer learnable parameters when adapting large language models and conditional image generators. The code of the experiments is available at \url{https://github.com/DaShenZi721/HRA}, and the method has been merged into the \href{https://github.com/huggingface/peft}{PEFT} package.
Paper Structure (28 sections, 9 equations, 11 figures, 7 tables)

This paper contains 28 sections, 9 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: (a) An illustration of our HRA method. (b) Comparisons for various methods on GLUE benchmark wang2019glue. The x-axis corresponds to the number of trainable parameters (M), and the y-axis corresponds to the average score (%). (c) Comparisons for various methods on the ratio of trainable parameters and accuracy (%) when adapting LLaMA2-7B touvron2023llama in mathematical reasoning tasks.
  • Figure 2: A 2D illustration indicating that when the reflection planes $\mathcal{H}_1$ and $\mathcal{H}_2$ are orthogonal, the distance $\|\bm{H}_2\bm{H}_1\bm{w}-\bm{w}\|_2$ is maximized.
  • Figure 3: The robustness of HRA ($r=8$) to $\lambda$ on MRPC.
  • Figure 4: The robustness of HRA ($r=8$) to $\lambda$ in mathematical reasoning tasks.
  • Figure 5: Qualitative results on subject-driven generation.
  • ...and 6 more figures