CompressNAS : A Fast and Efficient Technique for Model Compression using Decomposition
Sudhakar Sah, Nikhil Chabbra, Matthieu Durnerin
TL;DR
CompressNAS reframes Tucker-decomposition based CNN compression as a global neural architecture search problem, enabling globally consistent rank selection under memory/accuracy budgets. It integrates a MicroNAS-inspired workflow with a lightweight MSE-based accuracy proxy, fast flash estimation, and an ILP-driven search to rapidly generate compression configurations that preserve accuracy while achieving substantial parameter reductions. The approach yields competitive or superior results on ImageNet and COCO compared to prior Tucker-based methods, and introduces STResNet as a family of ultra-compact backbones well-suited for MCUs/NPUs and quantization-friendly deployment. Overall, CompressNAS provides a practical, single-search framework that supports budget-aware, fine-grain rank optimization and scalable model generation across multiple compression targets.
Abstract
Deep Convolutional Neural Networks (CNNs) are increasingly difficult to deploy on microcontrollers (MCUs) and lightweight NPUs (Neural Processing Units) due to their growing size and compute demands. Low-rank tensor decomposition, such as Tucker factorization, is a promising way to reduce parameters and operations with reasonable accuracy loss. However, existing approaches select ranks locally and often ignore global trade-offs between compression and accuracy. We introduce CompressNAS, a MicroNAS-inspired framework that treats rank selection as a global search problem. CompressNAS employs a fast accuracy estimator to evaluate candidate decompositions, enabling efficient yet exhaustive rank exploration under memory and accuracy constraints. In ImageNet, CompressNAS compresses ResNet-18 by 8x with less than 4% accuracy drop; on COCO, we achieve 2x compression of YOLOv5s without any accuracy drop and 2x compression of YOLOv5n with a 2.5% drop. Finally, we present a new family of compressed models, STResNet, with competitive performance compared to other efficient models.
