Low-Rank Approximation, Adaptation, and Other Tales

Jun Lu

Low-Rank Approximation, Adaptation, and Other Tales

Jun Lu

TL;DR

This survey analyzes how low-rank representations can be learned and exploited in data analysis and large language model tuning. It presents alternating least squares as a core method for obtaining low rank approximations and extends it to Hadamard, Kronecker, and Khatri-Rao decompositions, including missing data scenarios. It then connects these decompositions to practical model adaptation techniques for transformers, introducing LoRA, LoHA, LoKr, and LoKH as efficient fine tuning strategies with theoretical and empirical implications for memory, latency, and flexibility. The work highlights the algebraic structure of special matrix products to enable closed-form updates or gradient-based optimization, and demonstrates how these ideas can compress and adapt large neural networks while maintaining performance. Overall, the paper provides both foundational algorithms and forward-looking adaptation strategies for scalable, structure-preserving model customization.

Abstract

Low-rank approximation is a fundamental technique in modern data analysis, widely utilized across various fields such as signal processing, machine learning, and natural language processing. Despite its ubiquity, the mechanics of low-rank approximation and its application in adaptation can sometimes be obscure, leaving practitioners and researchers with questions about its true capabilities and limitations. This paper seeks to clarify low-rank approximation and adaptation by offering a comprehensive guide that reveals their inner workings and explains their utility in a clear and accessible way. Our focus here is to develop a solid intuition for how low-rank approximation and adaptation operate, and why they are so effective. We begin with basic concepts and gradually build up to the mathematical underpinnings, ensuring that readers of all backgrounds can gain a deeper understanding of low-rank approximation and adaptation. We strive to strike a balance between informal explanations and rigorous mathematics, ensuring that both newcomers and experienced experts can benefit from this survey. Additionally, we introduce new low-rank decomposition and adaptation algorithms that have not yet been explored in the field, hoping that future researchers will investigate their potential applicability.

Low-Rank Approximation, Adaptation, and Other Tales

TL;DR

Abstract

Paper Structure (30 sections, 12 theorems, 97 equations, 4 figures, 6 algorithms)

This paper contains 30 sections, 12 theorems, 97 equations, 4 figures, 6 algorithms.

Introduction
Low-Rank Decomposition via Alternating Least Squares
Key observation.
Regularization: Extension to General Matrices
Missing Entries and Rank-One Update
Given $\bm{W}$.
Given $\bm{Z}$.
Special Matrix Products and Properties
Kronecker Product
Khatri-Rao Product
More Properties of Special Matrix Products
$(\bm{A}\odot \bm{B} )^\top (\bm{A}\odot \bm{B} )= (\bm{A}^\top\bm{A}) \circ (\bm{B}^\top\bm{B})$.
$(\bm{A}\odot \bm{B} )^\top (\bm{C}\odot \bm{D} )= (\bm{A}^\top\bm{C}) \circ (\bm{B}^\top\bm{D})$.
$(\bm{A}\otimes\bm{B})(\bm{C}\odot \bm{D})=(\bm{A}\bm{C})\odot(\bm{B}\bm{D})$.
Low-Rank Hadamard Decomposition
...and 15 more sections

Key Result

Lemma 1

Suppose $\bm{A}\in \mathbb{R}^{M\times N}$ has full rank with $M\leq N$ and $\bm{W}\in \mathbb{R}^{M\times K}$ has full rank with $K<M$ (i.e., $K<M\leq N$), then the update of $\bm{Z}=(\bm{W}^\top\bm{W})^{-1} \bm{W}^\top \bm{A} \in \mathbb{R}^{K\times N}$ in Equation equation:als-z-update has full r

Figures (4)

Figure 1: Diagram illustrating the rank in a Hadamard product.
Figure 2: Diagram illustrating LoRA, LoHA, and LoKr (KronA).
Figure 3: Diagram illustrating LoRA and LoKH.
Figure 4: Diagram illustrating low-rank approximation for transformer architectures. The bottom-right figure illustrates the cascading of Khatri-Rao products such that $n=n_1n_2\ldots n_k$.

Theorems & Definitions (19)

Remark 1: Convexity and Global Minimum
Remark 2: Positive Definite Hessian if $\bm{W}$ Has Full Rank
Lemma 1: Rank of $\bm{Z}$ after Updating
Lemma 2: Rank of $\bm{W}$ after Updating
Definition 1: Matrix Kronecker Product
Lemma 3: Kronecker of Orthogonal, Triangular, Diagonal, (Semi)definite, Nonsingular
Lemma 4: Eigenvalue of Kronecker Product, horn1994topics
Remark 3: Properties of Kronecker Products
Definition 2: Khatri-Rao Product
Definition 3: Partition-wise Khatri-Rao Product
...and 9 more

Low-Rank Approximation, Adaptation, and Other Tales

TL;DR

Abstract

Low-Rank Approximation, Adaptation, and Other Tales

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (19)