Lossless Model Compression via Joint Low-Rank Factorization Optimization

Boyang Zhang; Daning Cheng; Yunquan Zhang; Fangming Liu; Jiake Tian

Lossless Model Compression via Joint Low-Rank Factorization Optimization

Boyang Zhang, Daning Cheng, Yunquan Zhang, Fangming Liu, Jiake Tian

TL;DR

This work shows that traditional post-hoc low-rank factorization and model training are optimally decoupled, leading to loss in performance. It introduces a lossless joint low-rank factorization framework that ties factorization perturbations $\delta$ to model loss $L$ through a total-differential analysis, enforcing inequality constraints to yield a numerical rank-defect optimization. Two greedy algorithms—lossless optimization and compact optimization—compress models without fine-tuning while preserving or even improving accuracy, demonstrated across diverse vision and language benchmarks. The approach is architecture- and decomposition-agnostic, scalable to large models, and has potential for practical deployment by enabling substantial parameter reductions with minimal or no loss in performance.

Abstract

Low-rank factorization is a popular model compression technique that minimizes the error $δ$ between approximated and original weight matrices. Despite achieving performances close to the original models when $δ$ is optimized, a performance discrepancy remains due to the separate optimization processes for low-rank factorization and model performance, resulting in unavoidable losses. We address this issue by introducing a novel joint optimization strategy for lossless low-rank weight factorization, which, for the first time, enhances the model's performance beyond the original. Our approach begins with a theoretical analysis of the relationship between low-rank factorization and model optimization objectives, establishing a precise perturbation range for matrix factorization errors on model performance. This challenge is then reformulated as a numerical rank deficiency problem with inequality constraints and develop a joint objective that simultaneously addresses factorization error and model performance. Based on the above analysis, we propose two optimization algorithms: \textbf{a lossless optimization algorithm} that maximizes model accuracy while ensuring compression, and \textbf{a compact optimization algorithm} that minimizes model size while preserving performance. These algorithms do not require fine-tuning and can directly compress numerous deep models to achieve lossless results. Our methods demonstrate robust efficacy across various vision and language tasks. For example, the compressed model reduced by 70\% on ResNext50 outperforms the original. Our code will be made public.

Lossless Model Compression via Joint Low-Rank Factorization Optimization

TL;DR

to model loss

through a total-differential analysis, enforcing inequality constraints to yield a numerical rank-defect optimization. Two greedy algorithms—lossless optimization and compact optimization—compress models without fine-tuning while preserving or even improving accuracy, demonstrated across diverse vision and language benchmarks. The approach is architecture- and decomposition-agnostic, scalable to large models, and has potential for practical deployment by enabling substantial parameter reductions with minimal or no loss in performance.

Abstract

Low-rank factorization is a popular model compression technique that minimizes the error

between approximated and original weight matrices. Despite achieving performances close to the original models when

is optimized, a performance discrepancy remains due to the separate optimization processes for low-rank factorization and model performance, resulting in unavoidable losses. We address this issue by introducing a novel joint optimization strategy for lossless low-rank weight factorization, which, for the first time, enhances the model's performance beyond the original. Our approach begins with a theoretical analysis of the relationship between low-rank factorization and model optimization objectives, establishing a precise perturbation range for matrix factorization errors on model performance. This challenge is then reformulated as a numerical rank deficiency problem with inequality constraints and develop a joint objective that simultaneously addresses factorization error and model performance. Based on the above analysis, we propose two optimization algorithms: \textbf{a lossless optimization algorithm} that maximizes model accuracy while ensuring compression, and \textbf{a compact optimization algorithm} that minimizes model size while preserving performance. These algorithms do not require fine-tuning and can directly compress numerous deep models to achieve lossless results. Our methods demonstrate robust efficacy across various vision and language tasks. For example, the compressed model reduced by 70\% on ResNext50 outperforms the original. Our code will be made public.

Lossless Model Compression via Joint Low-Rank Factorization Optimization

TL;DR

Abstract

Lossless Model Compression via Joint Low-Rank Factorization Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (3)