Table of Contents
Fetching ...

Theoretical Guarantees for Low-Rank Compression of Deep Neural Networks

Shihao Zhang, Rayan Saab

TL;DR

This work addresses the memory and computation challenges of deep neural networks by developing a theoretical framework for data-driven, post-training low-rank compression. It formulates the low-rank recovery problem in terms of activations and learns a compressed representation via rank-constrained or convex reconstructions, proving three recovery theorems under progressively weaker assumptions about the activation structure and noise. The results show that approximately low-rank activations allow accurate recovery of the compressed model with quantifiable error bounds, and extend to nonlinear ReLU activations through convex relaxations with additional logarithmic factors. Overall, the paper provides the first formal theoretical guarantees for data-driven, post-training low-rank compression methods and outlines avenues for extending these guarantees to tensors and gradient-based algorithms, with practical implications for reducing inference costs while preserving performance.

Abstract

Deep neural networks have achieved state-of-the-art performance across numerous applications, but their high memory and computational demands present significant challenges, particularly in resource-constrained environments. Model compression techniques, such as low-rank approximation, offer a promising solution by reducing the size and complexity of these networks while only minimally sacrificing accuracy. In this paper, we develop an analytical framework for data-driven post-training low-rank compression. We prove three recovery theorems under progressively weaker assumptions about the approximate low-rank structure of activations, modeling deviations via noise. Our results represent a step toward explaining why data-driven low-rank compression methods outperform data-agnostic approaches and towards theoretically grounded compression algorithms that reduce inference costs while maintaining performance.

Theoretical Guarantees for Low-Rank Compression of Deep Neural Networks

TL;DR

This work addresses the memory and computation challenges of deep neural networks by developing a theoretical framework for data-driven, post-training low-rank compression. It formulates the low-rank recovery problem in terms of activations and learns a compressed representation via rank-constrained or convex reconstructions, proving three recovery theorems under progressively weaker assumptions about the activation structure and noise. The results show that approximately low-rank activations allow accurate recovery of the compressed model with quantifiable error bounds, and extend to nonlinear ReLU activations through convex relaxations with additional logarithmic factors. Overall, the paper provides the first formal theoretical guarantees for data-driven, post-training low-rank compression methods and outlines avenues for extending these guarantees to tensors and gradient-based algorithms, with practical implications for reducing inference costs while preserving performance.

Abstract

Deep neural networks have achieved state-of-the-art performance across numerous applications, but their high memory and computational demands present significant challenges, particularly in resource-constrained environments. Model compression techniques, such as low-rank approximation, offer a promising solution by reducing the size and complexity of these networks while only minimally sacrificing accuracy. In this paper, we develop an analytical framework for data-driven post-training low-rank compression. We prove three recovery theorems under progressively weaker assumptions about the approximate low-rank structure of activations, modeling deviations via noise. Our results represent a step toward explaining why data-driven low-rank compression methods outperform data-agnostic approaches and towards theoretically grounded compression algorithms that reduce inference costs while maintaining performance.

Paper Structure

This paper contains 14 sections, 20 theorems, 83 equations.

Key Result

Theorem 1.1

(Abridged version of thm:recovery one) Let $X, \widecheck{X} \in \mathbb{R}^{d_1 \times d}$, $d_1\geq d$ be full rank and $W\in \mathbb{R}^{d \times d_2}$. Assume there exists a rank-$r$ matrix $M\in \mathbb{R}^{d \times d_2}$ such that $\|XW - (\widecheck{X}M+G)\|_{op}^2\leq \epsilon d_1$, where $G with high probability.

Theorems & Definitions (39)

  • Theorem 1.1
  • Theorem 1.2
  • Theorem 1.3
  • Theorem 3.1
  • proof
  • Definition 4.1
  • Remark 4.2
  • Remark 4.3
  • Theorem 4.4
  • proof
  • ...and 29 more