SWSC: Shared Weight for Similar Channel in LLM
Binrui Zeng, Yongtao Tang, Xiaodong Liu, Xiaopeng Li
TL;DR
The paper tackles the challenge of deploying large language models on resource-constrained platforms by introducing SWSC, a weight compression technique based on Shared Weight for Similar Channel. SWSC performs channel-wise K-Means clustering to replace similar channel vectors with a shared representative, reducing matrix storage from $m^2$ to $(k+1)m$ per weight matrix. To mitigate approximation-induced accuracy loss, it applies singular value decomposition to the weight error $W_{Err}=W-W'$, keeping the top $r$ components to form an error-correction term that is added during inference. Experiments on Llama-2-7B with WikiText-2 show SWSC often outperforms RTN quantization at low average bits, demonstrating effective, orthogonal compression that enables more efficient deployment of LLMs on constrained hardware.
Abstract
Large language models (LLMs) have spurred development in multiple industries. However, the growing number of their parameters brings substantial storage and computing burdens, making it essential to explore model compression techniques for parameter reduction and easier deployment. We propose SWSC, an LLM compression method based on the concept of Shared Weight for Similar Channel. It uses the K-Means clustering algorithm to cluster model weights channel-by-channel, generating clusters with highly similar vectors within each. A representative vector from each cluster is selected to approximately replace all vectors in the cluster, significantly reducing the number of model weight parameters. However, approximate restoration will inevitably cause damage to the performance of the model. To tackle this issue, we perform singular value decomposition on the weight error values before and after compression and retain the larger singular values and their corresponding singular vectors to compensate for the accuracy. The experimental results show that our method can effectively ensure the performance of the compressed LLM even under low-precision conditions.
