Table of Contents
Fetching ...

Optimal Brain Decomposition for Accurate LLM Low-Rank Approximation

Yuhang Li, Donghyun Lee, Ruokai Yin, Priyadarshini Panda

Abstract

Low-rank decomposition has emerged as an important problem in Large Language Model (LLM) fine-tuning and inference. Through Singular Value Decomposition (SVD), the weight matrix can be factorized into low-rank spaces optimally. Previously, a common practice was to decompose the weight in the activation-whitened space, and then achieve satisfying results. In this work, we propose Optimal Brain Decomposition LLM (OBD-LLM), which studies the decomposition problem in the model space by utilizing second-order Hessian information. Through a rigorous Kronecker-factorization of the Hessian, we show that the decomposition needs to consider both input and output information of the layer, and achieves much better decomposition results compared to input only method. Our loss-aware decomposition method involves a bi-directional whitening on the weight matrix. As a result, OBD-LLM is a closed-form solution for the optimal decomposition of weights in the language model. Remarkably, we achieve ~20-40\% better results than previous state-of-the-art decomposition methods, the SVD-LLM.

Optimal Brain Decomposition for Accurate LLM Low-Rank Approximation

Abstract

Low-rank decomposition has emerged as an important problem in Large Language Model (LLM) fine-tuning and inference. Through Singular Value Decomposition (SVD), the weight matrix can be factorized into low-rank spaces optimally. Previously, a common practice was to decompose the weight in the activation-whitened space, and then achieve satisfying results. In this work, we propose Optimal Brain Decomposition LLM (OBD-LLM), which studies the decomposition problem in the model space by utilizing second-order Hessian information. Through a rigorous Kronecker-factorization of the Hessian, we show that the decomposition needs to consider both input and output information of the layer, and achieves much better decomposition results compared to input only method. Our loss-aware decomposition method involves a bi-directional whitening on the weight matrix. As a result, OBD-LLM is a closed-form solution for the optimal decomposition of weights in the language model. Remarkably, we achieve ~20-40\% better results than previous state-of-the-art decomposition methods, the SVD-LLM.

Paper Structure

This paper contains 16 sections, 17 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Comparison between our and previous post-training low-rank decomposition methods.
  • Figure 2: Overview of our compression pipeline. To decompose a layer, we collect both the input activation and output activation gradient covariance matrices, and then utilize them to whiten the weight matrix for SVD. After decomposition, we utilize the inverse Cholesky to "color" the weight back.
  • Figure 3: Visualization of eigenvalues of covariance matrices.
  • Figure 4: Visualization of correlation factor across all projection layers in LLaMA-3-8B.
  • Figure 5: Latency comparison of decomposition runtime.