Table of Contents
Fetching ...

Geometry is All You Need: A Unified Taxonomy of Matrix and Tensor Factorization for Compression of Generative Language Models

Mingxue Xu, Sadia Sharmin, Danilo P. Mandic

TL;DR

A unified taxonomy is proposed, which bridges the matrix/tensor compression approaches and model compression concepts in ML and NLP research and adopts an elementary concept in linear algebra, that of a subspace, which is also the core concept in geometric algebra, to reformulate the matrix/tensor and ML/NLP concepts under one umbrella.

Abstract

Matrix and tensor-guided parametrization for Natural Language Processing (NLP) models is fundamentally useful for the improvement of the model's systematic efficiency. However, the internal links between these two algebra structures and language model parametrization are poorly understood. Also, the existing matrix and tensor research is math-heavy and far away from machine learning (ML) and NLP research concepts. These two issues result in the recent progress on matrices and tensors for model parametrization being more like a loose collection of separate components from matrix/tensor and NLP studies, rather than a well-structured unified approach, further hindering algorithm design. To this end, we propose a unified taxonomy, which bridges the matrix/tensor compression approaches and model compression concepts in ML and NLP research. Namely, we adopt an elementary concept in linear algebra, that of a subspace, which is also the core concept in geometric algebra, to reformulate the matrix/tensor and ML/NLP concepts (e.g. attention mechanism) under one umbrella. In this way, based on our subspace formalization, typical matrix and tensor decomposition algorithms can be interpreted as geometric transformations. Finally, we revisit recent literature on matrix- or tensor-guided language model compression, rephrase and compare their core ideas, and then point out the current research gap and potential solutions.

Geometry is All You Need: A Unified Taxonomy of Matrix and Tensor Factorization for Compression of Generative Language Models

TL;DR

A unified taxonomy is proposed, which bridges the matrix/tensor compression approaches and model compression concepts in ML and NLP research and adopts an elementary concept in linear algebra, that of a subspace, which is also the core concept in geometric algebra, to reformulate the matrix/tensor and ML/NLP concepts under one umbrella.

Abstract

Matrix and tensor-guided parametrization for Natural Language Processing (NLP) models is fundamentally useful for the improvement of the model's systematic efficiency. However, the internal links between these two algebra structures and language model parametrization are poorly understood. Also, the existing matrix and tensor research is math-heavy and far away from machine learning (ML) and NLP research concepts. These two issues result in the recent progress on matrices and tensors for model parametrization being more like a loose collection of separate components from matrix/tensor and NLP studies, rather than a well-structured unified approach, further hindering algorithm design. To this end, we propose a unified taxonomy, which bridges the matrix/tensor compression approaches and model compression concepts in ML and NLP research. Namely, we adopt an elementary concept in linear algebra, that of a subspace, which is also the core concept in geometric algebra, to reformulate the matrix/tensor and ML/NLP concepts (e.g. attention mechanism) under one umbrella. In this way, based on our subspace formalization, typical matrix and tensor decomposition algorithms can be interpreted as geometric transformations. Finally, we revisit recent literature on matrix- or tensor-guided language model compression, rephrase and compare their core ideas, and then point out the current research gap and potential solutions.
Paper Structure (32 sections, 2 theorems, 18 equations, 6 figures, 2 tables)

This paper contains 32 sections, 2 theorems, 18 equations, 6 figures, 2 tables.

Key Result

Theorem 1

Suppose an invertible linear transformation $T: {\mathbb{P}_1} \rightarrow {\mathbb{P}_2}$, and ${\bf v}=v_1 {\bf v}_1 + v_2 {\bf v}_2 + \cdots + v_m {\bf v}_m$, ${\bf v} \in \mathbb{P}_1$, ${\bf w}=w_1 {\bf w}_1 + w_2 {\bf w}_2 + \cdots + w_m {\bf w}_m$, ${\bf w} \in \mathbb{P}_2$. Then, we have for every ${\bf v}\in \mathbb{P}_1$ and ${\bf w}\in \mathbb{P}_2$.

Figures (6)

  • Figure 1: Addressed issues, solutions and paper structure overview.
  • Figure 2: Parametrization of generative neural networks in a vector space, and the process of model compression based on, taking a typical transformer vaswani2017attention forwarding pass as case study.
  • Figure 3: Typical geometric transformations in transformer models.
  • Figure 4: Matrix (Cayley) multiplication and Kronecker product.
  • Figure 5: Geometric interpretation of typical tensor decomposition formats, taking a $3$-order tensor as an example. Their vector forms are \ref{['eq:cp-vec', 'eq:tucker-vec']}. The golden cubes are the current constructed subsapces, the light green vectors are the subspace expansion represented by shared order (e.g. $\mathcal{A}^{(i)} \cap {\bf M}_i$ in Tucker format, $\mathcal{A}^{(i)} \cap {\bf G}_i$ and $\mathcal{A}^{(i)} \cap \mathcal{G}_i$ in Tensor-Train format), the dark green vectors are the current resulting vectors, inside the current constructed subspace. The red cube is the final resulted tensor, and pink planes represent resized dimension.
  • ...and 1 more figures

Theorems & Definitions (11)

  • Definition 1: Parameter Vector and Parameter Space
  • Definition 2: Low-rank Neural Network Compression
  • Definition 3: Generative Language Modelling
  • Definition 4: Accuracy
  • Definition 5: Fidelity
  • Definition 6: $(\delta_1, \delta_2, \eta)$ - Language Model Compression
  • Definition 7: Matrix Composition in Euclidean Space
  • Remark 1
  • Definition 8: Tensor Composition in Euclidean Space
  • Theorem 1: Singular Value Decomposition axler2024linear
  • ...and 1 more