Table of Contents
Fetching ...

A Novel Structure-Agnostic Multi-Objective Approach for Weight-Sharing Compression in Deep Neural Networks

Rasa Khosrowshahli, Shahryar Rahnamayan, Beatrice Ombuki-Berman

TL;DR

This paper presents a model-, architecture-, and data-agnostic deep neural network compression framework that uses a multi-objective evolutionary algorithm to determine an optimal equal-width weight binning (codebook) for weight-sharing. It avoids retraining, achieves linear-time quantization via uniform binning, and further improves compression through iterative merging of neighboring bins and Huffman coding of codebook indices. Across CIFAR-10/100 and ImageNet-1K, the approach yields substantial memory reductions (up to about 14x on CIFAR and up to around 8x on ImageNet) with minimal accuracy loss, demonstrating the potential of MO-driven, codebook-based weight sharing for scalable model compression. The work highlights the practicality of model-agnostic, data-agnostic compression and provides a framework for balancing compression ratio and performance via Pareto-front optimization.

Abstract

Deep neural networks suffer from storing millions and billions of weights in memory post-training, making challenging memory-intensive models to deploy on embedded devices. The weight-sharing technique is one of the popular compression approaches that use fewer weight values and share across specific connections in the network. In this paper, we propose a multi-objective evolutionary algorithm (MOEA) based compression framework independent of neural network architecture, dimension, task, and dataset. We use uniformly sized bins to quantize network weights into a single codebook (lookup table) for efficient weight representation. Using MOEA, we search for Pareto optimal $k$ bins by optimizing two objectives. Then, we apply the iterative merge technique to non-dominated Pareto frontier solutions by combining neighboring bins without degrading performance to decrease the number of bins and increase the compression ratio. Our approach is model- and layer-independent, meaning the weights are mixed in the clusters from any layer, and the uniform quantization method used in this work has $O(N)$ complexity instead of non-uniform quantization methods such as k-means with $O(Nkt)$ complexity. In addition, we use the center of clusters as the shared weight values instead of retraining shared weights, which is computationally expensive. The advantage of using evolutionary multi-objective optimization is that it can obtain non-dominated Pareto frontier solutions with respect to performance and shared weights. The experimental results show that we can reduce the neural network memory by $13.72 \sim14.98 \times$ on CIFAR-10, $11.61 \sim 12.99\times$ on CIFAR-100, and $7.44 \sim 8.58\times$ on ImageNet showcasing the effectiveness of the proposed deep neural network compression framework.

A Novel Structure-Agnostic Multi-Objective Approach for Weight-Sharing Compression in Deep Neural Networks

TL;DR

This paper presents a model-, architecture-, and data-agnostic deep neural network compression framework that uses a multi-objective evolutionary algorithm to determine an optimal equal-width weight binning (codebook) for weight-sharing. It avoids retraining, achieves linear-time quantization via uniform binning, and further improves compression through iterative merging of neighboring bins and Huffman coding of codebook indices. Across CIFAR-10/100 and ImageNet-1K, the approach yields substantial memory reductions (up to about 14x on CIFAR and up to around 8x on ImageNet) with minimal accuracy loss, demonstrating the potential of MO-driven, codebook-based weight sharing for scalable model compression. The work highlights the practicality of model-agnostic, data-agnostic compression and provides a framework for balancing compression ratio and performance via Pareto-front optimization.

Abstract

Deep neural networks suffer from storing millions and billions of weights in memory post-training, making challenging memory-intensive models to deploy on embedded devices. The weight-sharing technique is one of the popular compression approaches that use fewer weight values and share across specific connections in the network. In this paper, we propose a multi-objective evolutionary algorithm (MOEA) based compression framework independent of neural network architecture, dimension, task, and dataset. We use uniformly sized bins to quantize network weights into a single codebook (lookup table) for efficient weight representation. Using MOEA, we search for Pareto optimal bins by optimizing two objectives. Then, we apply the iterative merge technique to non-dominated Pareto frontier solutions by combining neighboring bins without degrading performance to decrease the number of bins and increase the compression ratio. Our approach is model- and layer-independent, meaning the weights are mixed in the clusters from any layer, and the uniform quantization method used in this work has complexity instead of non-uniform quantization methods such as k-means with complexity. In addition, we use the center of clusters as the shared weight values instead of retraining shared weights, which is computationally expensive. The advantage of using evolutionary multi-objective optimization is that it can obtain non-dominated Pareto frontier solutions with respect to performance and shared weights. The experimental results show that we can reduce the neural network memory by on CIFAR-10, on CIFAR-100, and on ImageNet showcasing the effectiveness of the proposed deep neural network compression framework.
Paper Structure (19 sections, 8 equations, 10 figures, 7 tables, 2 algorithms)

This paper contains 19 sections, 8 equations, 10 figures, 7 tables, 2 algorithms.

Figures (10)

  • Figure 1: Diagram illustrates the proposed compression pipeline for deep neural network parameters. The first stage is to find optimal $k$ for ordinary equal-width histogram clustering using evolutionary multi-objective optimization, which results in a set of Pareto frontier for various $k$s with respect to the resulting number of shared weights $d$. The second stage is to utilize solutions above the baseline threshold for an iterative merge for extra compression of $d$s to get $m$ shared weights. In the last stage, we reduced the size of codebooks from using fixed-length codes to variable-length codes by Huffman coding, which results in the shown Huffman tree.
  • Figure 2: The transition from many $k=1024$ uniform intervals (Figure \ref{['fig:k1024_cifar10']}) to fewer $d=538$ uniform intervals (Figure \ref{['fig:k1024_d3431_cifar10']}) after removing non-associated intervals for quantization of parameters in ResNet-18 model trained on CIFAR-10.
  • Figure 3: The graphs in the first row show the procedure of computing bins of weights by uniform quantization with $k$ bins and then removing empty bins to get $d$ clusters. In the second row, the graphs show the extra compression of clusters that resulted from the proposed iterative merge. Here, the left and right graph shows the merged clusters before and after rearranging block indices by removing indices of merged neighbors.
  • Figure 4: Resulting Pareto optimal frontier solutions by MO-UB on ResNet-18 trained on CIFAR-10/100.
  • Figure 5: Resulting Pareto optimal frontier solutions by MO-UB on ResNet-18 trained on CIFAR-10/100.
  • ...and 5 more figures