Table of Contents
Fetching ...

VQ4ALL: Efficient Neural Network Representation via a Universal Codebook

Juncan Deng, Shuaiting Li, Zeyu Wang, Hong Gu, Kedong Xu, Kejie Huang

TL;DR

This paper introduces VQ4ALL, a VQ-based method that utilizes codewords to enable the construction of various neural networks and achieve efficient representations, and adopts a kernel density estimation approach to extract a universal codebook and then progressively construct different low-bit networks by updating differentiable assignments.

Abstract

The rapid growth of the big neural network models puts forward new requirements for lightweight network representation methods. The traditional methods based on model compression have achieved great success, especially VQ technology which realizes the high compression ratio of models by sharing code words. However, because each layer of the network needs to build a code table, the traditional top-down compression technology lacks attention to the underlying commonalities, resulting in limited compression rate and frequent memory access. In this paper, we propose a bottom-up method to share the universal codebook among multiple neural networks, which not only effectively reduces the number of codebooks but also further reduces the memory access and chip area by storing static code tables in the built-in ROM. Specifically, we introduce VQ4ALL, a VQ-based method that utilizes codewords to enable the construction of various neural networks and achieve efficient representations. The core idea of our method is to adopt a kernel density estimation approach to extract a universal codebook and then progressively construct different low-bit networks by updating differentiable assignments. Experimental results demonstrate that VQ4ALL achieves compression rates exceeding 16 $\times$ while preserving high accuracy across multiple network architectures, highlighting its effectiveness and versatility.

VQ4ALL: Efficient Neural Network Representation via a Universal Codebook

TL;DR

This paper introduces VQ4ALL, a VQ-based method that utilizes codewords to enable the construction of various neural networks and achieve efficient representations, and adopts a kernel density estimation approach to extract a universal codebook and then progressively construct different low-bit networks by updating differentiable assignments.

Abstract

The rapid growth of the big neural network models puts forward new requirements for lightweight network representation methods. The traditional methods based on model compression have achieved great success, especially VQ technology which realizes the high compression ratio of models by sharing code words. However, because each layer of the network needs to build a code table, the traditional top-down compression technology lacks attention to the underlying commonalities, resulting in limited compression rate and frequent memory access. In this paper, we propose a bottom-up method to share the universal codebook among multiple neural networks, which not only effectively reduces the number of codebooks but also further reduces the memory access and chip area by storing static code tables in the built-in ROM. Specifically, we introduce VQ4ALL, a VQ-based method that utilizes codewords to enable the construction of various neural networks and achieve efficient representations. The core idea of our method is to adopt a kernel density estimation approach to extract a universal codebook and then progressively construct different low-bit networks by updating differentiable assignments. Experimental results demonstrate that VQ4ALL achieves compression rates exceeding 16 while preserving high accuracy across multiple network architectures, highlighting its effectiveness and versatility.

Paper Structure

This paper contains 21 sections, 14 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: The bottom-up pipeline of VQ4ALL. The Universal Codebook is randomly sampled from the kernel density estimation of the floating-point sub-vectors of several networks. Based on the codebook, differentiable candidate assignments are assigned to each type of network, all initialized with the same ratios. The progressive network construction strategy then calibrates the ratios and sets candidate assignments with high ratios as optimal. Finally, various low-bit networks are constructed with the same universal codebook.
  • Figure 2: Compression results for ResNet-18 and ResNet-50. We compare the trade-off between accuracy and compression ratio, using pre-trained models from the PyTorch zoo as baselines. Overall, our method demonstrates superior accuracy compared to previous approaches.
  • Figure 3: Ablation results on Progressive Network Construction (PNC) Strategy with the same 2-bit ResNet-18 compression configuration. Up: we compare the accuracy of VQ4ALL with and without PNC at each epoch. Once the VQ4ALL pipeline is complete, we convert the candidate assignments with the largest ratios in VQ4ALL (no PNC) to optimal assignments, resulting in a drop in accuracy to 57.77%. Down: in the distribution of the largest ratios, 15% are outliers significantly distant from 1.
  • Figure 4: 2-bit ResNet-18/50 compression with varying ratio threshold $\alpha$ values.
  • Figure 5: Optimal assignment distribution of various low-bit Networks, with most layers constructed from the universal codebook.
  • ...and 2 more figures