Geometry is All You Need: A Unified Taxonomy of Matrix and Tensor Factorization for Compression of Generative Language Models

Mingxue Xu; Sadia Sharmin; Danilo P. Mandic

Geometry is All You Need: A Unified Taxonomy of Matrix and Tensor Factorization for Compression of Generative Language Models

Mingxue Xu, Sadia Sharmin, Danilo P. Mandic

TL;DR

A unified taxonomy is proposed, which bridges the matrix/tensor compression approaches and model compression concepts in ML and NLP research and adopts an elementary concept in linear algebra, that of a subspace, which is also the core concept in geometric algebra, to reformulate the matrix/tensor and ML/NLP concepts under one umbrella.

Abstract

Matrix and tensor-guided parametrization for Natural Language Processing (NLP) models is fundamentally useful for the improvement of the model's systematic efficiency. However, the internal links between these two algebra structures and language model parametrization are poorly understood. Also, the existing matrix and tensor research is math-heavy and far away from machine learning (ML) and NLP research concepts. These two issues result in the recent progress on matrices and tensors for model parametrization being more like a loose collection of separate components from matrix/tensor and NLP studies, rather than a well-structured unified approach, further hindering algorithm design. To this end, we propose a unified taxonomy, which bridges the matrix/tensor compression approaches and model compression concepts in ML and NLP research. Namely, we adopt an elementary concept in linear algebra, that of a subspace, which is also the core concept in geometric algebra, to reformulate the matrix/tensor and ML/NLP concepts (e.g. attention mechanism) under one umbrella. In this way, based on our subspace formalization, typical matrix and tensor decomposition algorithms can be interpreted as geometric transformations. Finally, we revisit recent literature on matrix- or tensor-guided language model compression, rephrase and compare their core ideas, and then point out the current research gap and potential solutions.

Geometry is All You Need: A Unified Taxonomy of Matrix and Tensor Factorization for Compression of Generative Language Models

TL;DR

Abstract

Paper Structure (32 sections, 2 theorems, 18 equations, 6 figures, 2 tables)

This paper contains 32 sections, 2 theorems, 18 equations, 6 figures, 2 tables.

Introduction
Geometric Algebra for Model Compression
Parameter Space and Model Compression
Generative Language Modelling with Parameter Vector
Generative Language Model Compression
Weight Matrices/Tensors Composed by Subspaces
Layerwise Parameterization
Matrix and Tensor Composition
Interpretion of Transformer Modules with Subspaces
Linear Layer (the only parametric part of feed-forward layer)
Concantenation
Attention Layer
Matrix and Tensor Factorization from a Subspace Composition View
Common Operations on Subspaces
Cartesian Product and Tensor Product on $1$-order Subspaces
...and 17 more sections

Key Result

Theorem 1

Suppose an invertible linear transformation $T: {\mathbb{P}_1} \rightarrow {\mathbb{P}_2}$, and ${\bf v}=v_1 {\bf v}_1 + v_2 {\bf v}_2 + \cdots + v_m {\bf v}_m$, ${\bf v} \in \mathbb{P}_1$, ${\bf w}=w_1 {\bf w}_1 + w_2 {\bf w}_2 + \cdots + w_m {\bf w}_m$, ${\bf w} \in \mathbb{P}_2$. Then, we have for every ${\bf v}\in \mathbb{P}_1$ and ${\bf w}\in \mathbb{P}_2$.

Figures (6)

Figure 1: Addressed issues, solutions and paper structure overview.
Figure 2: Parametrization of generative neural networks in a vector space, and the process of model compression based on, taking a typical transformer vaswani2017attention forwarding pass as case study.
Figure 3: Typical geometric transformations in transformer models.
Figure 4: Matrix (Cayley) multiplication and Kronecker product.
Figure 5: Geometric interpretation of typical tensor decomposition formats, taking a $3$-order tensor as an example. Their vector forms are \ref{['eq:cp-vec', 'eq:tucker-vec']}. The golden cubes are the current constructed subsapces, the light green vectors are the subspace expansion represented by shared order (e.g. $\mathcal{A}^{(i)} \cap {\bf M}_i$ in Tucker format, $\mathcal{A}^{(i)} \cap {\bf G}_i$ and $\mathcal{A}^{(i)} \cap \mathcal{G}_i$ in Tensor-Train format), the dark green vectors are the current resulting vectors, inside the current constructed subspace. The red cube is the final resulted tensor, and pink planes represent resized dimension.
...and 1 more figures

Theorems & Definitions (11)

Definition 1: Parameter Vector and Parameter Space
Definition 2: Low-rank Neural Network Compression
Definition 3: Generative Language Modelling
Definition 4: Accuracy
Definition 5: Fidelity
Definition 6: $(\delta_1, \delta_2, \eta)$ - Language Model Compression
Definition 7: Matrix Composition in Euclidean Space
Remark 1
Definition 8: Tensor Composition in Euclidean Space
Theorem 1: Singular Value Decomposition axler2024linear
...and 1 more

Geometry is All You Need: A Unified Taxonomy of Matrix and Tensor Factorization for Compression of Generative Language Models

TL;DR

Abstract

Geometry is All You Need: A Unified Taxonomy of Matrix and Tensor Factorization for Compression of Generative Language Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (11)