Application-Specific Component-Aware Structured Pruning of Deep Neural Networks in Control via Soft Coefficient Optimization
Ganesh Sundaram, Jonas Ulmen, Amjad Haider, Daniel Görges
TL;DR
This paper addresses application-specific pruning of neural networks used as controllers, where standard compression methods risk degrading task-critical behaviors. It proposes an application-aware, component-aware structured pruning framework that assigns learnable soft pruning coefficients to groups of parameters and optimizes them to meet a target sparsity $\rho$ within tolerance $\varepsilon$ via grid search or a constrained gradient-based method. The approach relies on a dependency-graph representation to define pruning groups and uses task-specific metrics (e.g., PSNR for MNIST autoencoders, episode reward for TD-MPC) to guide pruning. Experiments on a MNIST autoencoder and a TD-MPC agent show that gradient-based coefficient optimization yields higher task performance at fixed sparsity than traditional magnitude-based pruning, with substantial reductions in search time and improved stability of latent-space representations. The results highlight a practical path to deploy compact, control-oriented DNNs while preserving critical application features.
Abstract
Deep neural networks (DNNs) offer significant flexibility and robust performance. This makes them ideal for building not only system models but also advanced neural network controllers (NNCs). However, their high complexity and computational needs often limit their use. Various model compression strategies have been developed over the past few decades to address these issues. These strategies are effective for general DNNs but do not directly apply to NNCs. NNCs need both size reduction and the retention of key application-specific performance features. In structured pruning, which removes groups of related elements, standard importance metrics often fail to protect these critical characteristics. In this paper, we introduce a novel framework for calculating importance metrics in pruning groups. This framework not only shrinks the model size but also considers various application-specific constraints. To find the best pruning coefficient for each group, we evaluate two approaches. The first approach involves simple exploration through grid search. The second utilizes gradient descent optimization, aiming to balance compression and task performance. We test our method in two use cases: one on an MNIST autoencoder and the other on a Temporal Difference Model Predictive Control (TDMPC) agent. Results show that the method effectively maintains application-relevant performance while achieving a significant reduction in model size.
