LatentLLM: Attention-Aware Joint Tensor Compression

Toshiaki Koike-Akino; Xiangyu Chen; Jing Liu; Ye Wang; Pu; Wang; Matthew Brand

LatentLLM: Attention-Aware Joint Tensor Compression

Toshiaki Koike-Akino, Xiangyu Chen, Jing Liu, Ye Wang, Pu, Wang, Matthew Brand

TL;DR

The paper tackles the substantial resource requirements of large language and multi-modal models by proposing LatentLLM, a training-free framework that converts pretrained models into a latent, reduced-dimension structure. It achieves this through a two-stage strategy: activation-aware local SVD pre-conditioning and a global, joint tensor decomposition that jointly compresses multiple linear modules, including Q/K/V/O and MLP components. Key contributions include optimal pre-conditioning with $P = C^{1/2}$, the junction-matrix trick to minimize parameters, and a high-order SVD approach (HOSVD/Tucker) to handle joint QK, VO, and UD decompositions. Empirically, LatentLLM and its multi-modal variant LatentLLaVa show significant improvements over local baselines across perplexity benchmarks and multi-modal reasoning tasks, with substantial reductions in FLOPs and parameters, enabling more efficient yet accurate LLMs/LMMs.

Abstract

Modern foundation models such as large language models (LLMs) and large multi-modal models (LMMs) require a massive amount of computational and memory resources. We propose a new framework to convert such LLMs/LMMs into a reduced-dimension latent structure. Our method extends a local activation-aware tensor decomposition to a global attention-aware joint tensor de-composition. Our framework can significantly improve the model accuracy over the existing model compression methods when reducing the latent dimension to realize computationally/memory-efficient LLMs/LLMs. We show the benefit on several benchmark including multi-modal reasoning tasks.

LatentLLM: Attention-Aware Joint Tensor Compression

TL;DR

Abstract

LatentLLM: Attention-Aware Joint Tensor Compression

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)

Theorems & Definitions (11)