Table of Contents
Fetching ...

A General Error-Theoretical Analysis Framework for Constructing Compression Strategies

Boyang Zhang, Daning Cheng, Yunquan Zhang, Meiqi Tu, Fangming Liu, Jiake Tian

TL;DR

The paper tackles the challenge of efficient model compression by enabling layer-wise, differentiated quantization through a theory-driven framework. Compression Error Theory (CET) recasts quantization error as a quadratic form and uses total differentiation plus algebraic geometry to identify a long-axis subspace in which parameter perturbations minimally affect performance, enabling near-lossless, retraining-free compression. By leveraging Hessian-based analysis (via Lanczos) and orthogonal decomposition, CET determines per-layer bit-width allocations that maximize parameter reduction while controlling loss. Experiments on ResNet variants and NLP benchmarks demonstrate substantial compression (up to ~11x–13x) with minimal or even improved accuracy, highlighting CET’s practical impact and generality across compression methods.

Abstract

The exponential growth in parameter size and computational complexity of deep models poses significant challenges for efficient deployment. The core problem of existing compression methods is that different layers of the model have significant differences in their tolerance to compression levels. For instance, the first layer of a model can typically sustain a higher compression level compared to the last layer without compromising performance. Thus, the key challenge lies in how to allocate compression levels across layers in a way that minimizes performance loss while maximizing parameter reduction. To address this challenge, we propose a Compression Error Theory (CET) framework, designed to determine the optimal compression level for each layer. Taking quantization as an example, CET leverages differential expansion and algebraic geometry to reconstruct the quadratic form of quantization error as ellipsoids and hyperbolic paraboloids, and utilizes their geometric structures to define an error subspace. To identify the error subspace with minimal performance loss, by performing orthogonal decomposition of the geometric space, CET transforms the optimization process of the error subspace into a complementary problem. The final theoretical analysis shows that constructing the quantization subspace along the major axis results in minimal performance degradation. Through experimental verification of the theory, CET can greatly retain performance while compressing. Specifically, on the ResNet-34 model, CET achieves nearly 11$\times$ parameter compression while even surpassing performance comparable to the original model.

A General Error-Theoretical Analysis Framework for Constructing Compression Strategies

TL;DR

The paper tackles the challenge of efficient model compression by enabling layer-wise, differentiated quantization through a theory-driven framework. Compression Error Theory (CET) recasts quantization error as a quadratic form and uses total differentiation plus algebraic geometry to identify a long-axis subspace in which parameter perturbations minimally affect performance, enabling near-lossless, retraining-free compression. By leveraging Hessian-based analysis (via Lanczos) and orthogonal decomposition, CET determines per-layer bit-width allocations that maximize parameter reduction while controlling loss. Experiments on ResNet variants and NLP benchmarks demonstrate substantial compression (up to ~11x–13x) with minimal or even improved accuracy, highlighting CET’s practical impact and generality across compression methods.

Abstract

The exponential growth in parameter size and computational complexity of deep models poses significant challenges for efficient deployment. The core problem of existing compression methods is that different layers of the model have significant differences in their tolerance to compression levels. For instance, the first layer of a model can typically sustain a higher compression level compared to the last layer without compromising performance. Thus, the key challenge lies in how to allocate compression levels across layers in a way that minimizes performance loss while maximizing parameter reduction. To address this challenge, we propose a Compression Error Theory (CET) framework, designed to determine the optimal compression level for each layer. Taking quantization as an example, CET leverages differential expansion and algebraic geometry to reconstruct the quadratic form of quantization error as ellipsoids and hyperbolic paraboloids, and utilizes their geometric structures to define an error subspace. To identify the error subspace with minimal performance loss, by performing orthogonal decomposition of the geometric space, CET transforms the optimization process of the error subspace into a complementary problem. The final theoretical analysis shows that constructing the quantization subspace along the major axis results in minimal performance degradation. Through experimental verification of the theory, CET can greatly retain performance while compressing. Specifically, on the ResNet-34 model, CET achieves nearly 11 parameter compression while even surpassing performance comparable to the original model.

Paper Structure

This paper contains 15 sections, 13 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: The CET framework completes the transition from algebraic to geometric analysis, providing a theoretical foundation for the quantization error vector. Based on the positive definiteness of the Hessian matrix at the convergence point, CET reconstructs the quadratic form of quantization error into ellipsoids or hyperbolic paraboloids. For ellipsoids, the direction along the long axis corresponds to the direction with the slowest increase in loss. For hyperbolic paraboloids, analogous to ellipsoids, the eigenvector corresponding to the negative eigenvalue defines the long-axis direction, representing the direction of loss reduction. Theoretical analysis suggests that the quantization error vector should be optimized along these two directions.
  • Figure 2: The gap between theory and practice after adding different perturbations to each layer. This ensures that the theoretical approximation can effectively represent the actual value in a sufficiently small neighborhood.
  • Figure 3: As the number of short-axis eigenvalues increases, the number of equations in the underdetermined system of Eq.\ref{['eq13']} also grows, leading to a decrease in model loss. However, this increase is accompanied by a corresponding rise in computational overhead (the area of the circle increases).