A Novel Structure-Agnostic Multi-Objective Approach for Weight-Sharing Compression in Deep Neural Networks
Rasa Khosrowshahli, Shahryar Rahnamayan, Beatrice Ombuki-Berman
TL;DR
This paper presents a model-, architecture-, and data-agnostic deep neural network compression framework that uses a multi-objective evolutionary algorithm to determine an optimal equal-width weight binning (codebook) for weight-sharing. It avoids retraining, achieves linear-time quantization via uniform binning, and further improves compression through iterative merging of neighboring bins and Huffman coding of codebook indices. Across CIFAR-10/100 and ImageNet-1K, the approach yields substantial memory reductions (up to about 14x on CIFAR and up to around 8x on ImageNet) with minimal accuracy loss, demonstrating the potential of MO-driven, codebook-based weight sharing for scalable model compression. The work highlights the practicality of model-agnostic, data-agnostic compression and provides a framework for balancing compression ratio and performance via Pareto-front optimization.
Abstract
Deep neural networks suffer from storing millions and billions of weights in memory post-training, making challenging memory-intensive models to deploy on embedded devices. The weight-sharing technique is one of the popular compression approaches that use fewer weight values and share across specific connections in the network. In this paper, we propose a multi-objective evolutionary algorithm (MOEA) based compression framework independent of neural network architecture, dimension, task, and dataset. We use uniformly sized bins to quantize network weights into a single codebook (lookup table) for efficient weight representation. Using MOEA, we search for Pareto optimal $k$ bins by optimizing two objectives. Then, we apply the iterative merge technique to non-dominated Pareto frontier solutions by combining neighboring bins without degrading performance to decrease the number of bins and increase the compression ratio. Our approach is model- and layer-independent, meaning the weights are mixed in the clusters from any layer, and the uniform quantization method used in this work has $O(N)$ complexity instead of non-uniform quantization methods such as k-means with $O(Nkt)$ complexity. In addition, we use the center of clusters as the shared weight values instead of retraining shared weights, which is computationally expensive. The advantage of using evolutionary multi-objective optimization is that it can obtain non-dominated Pareto frontier solutions with respect to performance and shared weights. The experimental results show that we can reduce the neural network memory by $13.72 \sim14.98 \times$ on CIFAR-10, $11.61 \sim 12.99\times$ on CIFAR-100, and $7.44 \sim 8.58\times$ on ImageNet showcasing the effectiveness of the proposed deep neural network compression framework.
