Table of Contents
Fetching ...

Lossless Model Compression via Joint Low-Rank Factorization Optimization

Boyang Zhang, Daning Cheng, Yunquan Zhang, Fangming Liu, Jiake Tian

TL;DR

This work shows that traditional post-hoc low-rank factorization and model training are optimally decoupled, leading to loss in performance. It introduces a lossless joint low-rank factorization framework that ties factorization perturbations $\delta$ to model loss $L$ through a total-differential analysis, enforcing inequality constraints to yield a numerical rank-defect optimization. Two greedy algorithms—lossless optimization and compact optimization—compress models without fine-tuning while preserving or even improving accuracy, demonstrated across diverse vision and language benchmarks. The approach is architecture- and decomposition-agnostic, scalable to large models, and has potential for practical deployment by enabling substantial parameter reductions with minimal or no loss in performance.

Abstract

Low-rank factorization is a popular model compression technique that minimizes the error $δ$ between approximated and original weight matrices. Despite achieving performances close to the original models when $δ$ is optimized, a performance discrepancy remains due to the separate optimization processes for low-rank factorization and model performance, resulting in unavoidable losses. We address this issue by introducing a novel joint optimization strategy for lossless low-rank weight factorization, which, for the first time, enhances the model's performance beyond the original. Our approach begins with a theoretical analysis of the relationship between low-rank factorization and model optimization objectives, establishing a precise perturbation range for matrix factorization errors on model performance. This challenge is then reformulated as a numerical rank deficiency problem with inequality constraints and develop a joint objective that simultaneously addresses factorization error and model performance. Based on the above analysis, we propose two optimization algorithms: \textbf{a lossless optimization algorithm} that maximizes model accuracy while ensuring compression, and \textbf{a compact optimization algorithm} that minimizes model size while preserving performance. These algorithms do not require fine-tuning and can directly compress numerous deep models to achieve lossless results. Our methods demonstrate robust efficacy across various vision and language tasks. For example, the compressed model reduced by 70\% on ResNext50 outperforms the original. Our code will be made public.

Lossless Model Compression via Joint Low-Rank Factorization Optimization

TL;DR

This work shows that traditional post-hoc low-rank factorization and model training are optimally decoupled, leading to loss in performance. It introduces a lossless joint low-rank factorization framework that ties factorization perturbations to model loss through a total-differential analysis, enforcing inequality constraints to yield a numerical rank-defect optimization. Two greedy algorithms—lossless optimization and compact optimization—compress models without fine-tuning while preserving or even improving accuracy, demonstrated across diverse vision and language benchmarks. The approach is architecture- and decomposition-agnostic, scalable to large models, and has potential for practical deployment by enabling substantial parameter reductions with minimal or no loss in performance.

Abstract

Low-rank factorization is a popular model compression technique that minimizes the error between approximated and original weight matrices. Despite achieving performances close to the original models when is optimized, a performance discrepancy remains due to the separate optimization processes for low-rank factorization and model performance, resulting in unavoidable losses. We address this issue by introducing a novel joint optimization strategy for lossless low-rank weight factorization, which, for the first time, enhances the model's performance beyond the original. Our approach begins with a theoretical analysis of the relationship between low-rank factorization and model optimization objectives, establishing a precise perturbation range for matrix factorization errors on model performance. This challenge is then reformulated as a numerical rank deficiency problem with inequality constraints and develop a joint objective that simultaneously addresses factorization error and model performance. Based on the above analysis, we propose two optimization algorithms: \textbf{a lossless optimization algorithm} that maximizes model accuracy while ensuring compression, and \textbf{a compact optimization algorithm} that minimizes model size while preserving performance. These algorithms do not require fine-tuning and can directly compress numerous deep models to achieve lossless results. Our methods demonstrate robust efficacy across various vision and language tasks. For example, the compressed model reduced by 70\% on ResNext50 outperforms the original. Our code will be made public.

Paper Structure

This paper contains 12 sections, 3 theorems, 13 equations, 3 figures, 8 tables, 2 algorithms.

Key Result

Lemma 1

The weight increment of the loss function at a certain point can be estimated by the sum of the products of each partial derivative and a small change in the weight variable.

Figures (3)

  • Figure 1: The left subfigure shows the process of factorization, $\delta$ is the noise error introduced by the factorization. The right subfigure shows the Loss comparison between our algorithm and existing factorization algorithms. Our algorithm factorizes models losslessly. $L$ is the model loss.
  • Figure 2: Weight error and loss values of different models, where Rank1$>$Rank2$>$Rank3$>$Rank4. When the weight error $\delta$ is the lowest, the model loss $L$ does not reach the minimum value.
  • Figure 3: Loss performance of our algorithm on deep DenseNet169 and shallow VGG-19 model. Our algorithms have lower losses than the original model under the condition that $0<Rank<\frac{NM}{N+M}$ is satisfied.

Theorems & Definitions (3)

  • Lemma 1
  • Lemma 2
  • Lemma 3