Exchangeability in Neural Network and its Application to Dynamic Pruning
Pu, Yi, Tianlang Chen, Yifan Yang, Sara Achour
TL;DR
ExPrune introduces exchangeability as a statistical symmetry between groups of neural network parameters and their intermediate values, enabling dynamic, per-input partial computation without architectural changes. By modeling training as sampling from a distribution over randomly initialized models, the authors prove that certain parameter groups have exchangeable marginals and that derived intermediate values $\xi_i$ are exchangeable for any input. Two instantiations—Early Negative Prediction for ReLU and Dominance Prediction for prediction heads—achieve substantial FLOPs reductions on CNNs, GNNs, and LMs while preserving accuracy, using decisions based on partial sums like $\sum_{i=1}^k \xi_i$. ExPrune also composes with static magnitude pruning, providing additional FLOPs reductions on aggressively pruned models, demonstrating broad applicability and practical gains for dynamic inference optimization.
Abstract
Modern neural networks (NN) contain an ever-growing number of parameters, substantially increasing the memory and computational cost of inference. Researchers have explored various ways to reduce the inference cost of NNs by reducing the model size before deployment and dynamically pruning the inference computation at runtime. In this work, we present ExPrune, a general, dynamic pruning optimization that enables multi-granularity partial computation on a per-input basis. ExPrune requires no change to the model architecture or the training algorithm. ExPrune is based on our theoretical results that the relationship between certain model parameters and intermediate values can be described by a statistical property called exchangeability. By identifying exchangeable parameters and values in the model, we are able to first partially evaluate the network, analyze the statistics of the partial results, and make pruning decisions on the fly. Because ExPrune is theory grounded, it generalizes across model architectures in different problem domains. We evaluate ExPrune on one computer vision models, one graph model and one language model. ExPrune provides 10.98--17.33% reduction in FLOPs with negligible accuracy drop and 21.61--27.16% reduction in FLOPs with at most 1% accuracy drop. We also demonstrate that ExPrune composes with static magnitude pruning. On models that have been aggressively statically pruned, ExPrune still provides additional 10.24--11.11% reduction in FLOPs with negligible accuracy drop and 13.91--14.39% reduction in FLOPs with at most 1% accuracy drop.
