TRAWL: Tensor Reduced and Approximated Weights for Large Language Models
Yiran Luo, Het Patel, Yu Fu, Dawon Ahn, Jia Chen, Yue Dong, Evangelos E. Papalexakis
TL;DR
TRAWL tackles the inefficiency of large language models by denoising weights with tensor decomposition across multiple matrices, instead of single-matrix factorization. It stacks Q/K/V/O or FC weights into a 3-mode tensor and applies CP or Tucker decomposition, treating the rank $R$ as a hyperparameter, with no additional training required. Across two models and three benchmark datasets, layer-by-layer CP decomposition of the final FC layers yields the strongest gains, up to 16% accuracy improvements, while global decomposition can hurt performance, and segmented layers offer targeted benefits. The work demonstrates practical, post-training compression that reduces noise and improves generalization, and provides public code to enable further research and real-world application.
Abstract
Recent research has shown that pruning large-scale language models for inference is an effective approach to improving model efficiency, significantly reducing model weights with minimal impact on performance. Interestingly, pruning can sometimes even enhance accuracy by removing noise that accumulates during training, particularly through matrix decompositions. However, recent work has primarily focused on single matrix decompositions or lower precision techniques, which may fail to fully capture structural patterns. To address these limitations, we introduce TRAWL (Tensor Reduced and Approximated Weights for Large Language Models), a technique that applies tensor decomposition across multiple weight matrices to effectively denoise LLMs by capturing global structural patterns. Our experiments show that TRAWL improves model performance by up to 16% over baseline models on benchmark datasets, without requiring additional data, training, or fine-tuning.
