Theoretical Guarantees for Low-Rank Compression of Deep Neural Networks
Shihao Zhang, Rayan Saab
TL;DR
This work addresses the memory and computation challenges of deep neural networks by developing a theoretical framework for data-driven, post-training low-rank compression. It formulates the low-rank recovery problem in terms of activations and learns a compressed representation via rank-constrained or convex reconstructions, proving three recovery theorems under progressively weaker assumptions about the activation structure and noise. The results show that approximately low-rank activations allow accurate recovery of the compressed model with quantifiable error bounds, and extend to nonlinear ReLU activations through convex relaxations with additional logarithmic factors. Overall, the paper provides the first formal theoretical guarantees for data-driven, post-training low-rank compression methods and outlines avenues for extending these guarantees to tensors and gradient-based algorithms, with practical implications for reducing inference costs while preserving performance.
Abstract
Deep neural networks have achieved state-of-the-art performance across numerous applications, but their high memory and computational demands present significant challenges, particularly in resource-constrained environments. Model compression techniques, such as low-rank approximation, offer a promising solution by reducing the size and complexity of these networks while only minimally sacrificing accuracy. In this paper, we develop an analytical framework for data-driven post-training low-rank compression. We prove three recovery theorems under progressively weaker assumptions about the approximate low-rank structure of activations, modeling deviations via noise. Our results represent a step toward explaining why data-driven low-rank compression methods outperform data-agnostic approaches and towards theoretically grounded compression algorithms that reduce inference costs while maintaining performance.
