Table of Contents
Fetching ...

Exchangeability in Neural Network and its Application to Dynamic Pruning

Pu, Yi, Tianlang Chen, Yifan Yang, Sara Achour

TL;DR

ExPrune introduces exchangeability as a statistical symmetry between groups of neural network parameters and their intermediate values, enabling dynamic, per-input partial computation without architectural changes. By modeling training as sampling from a distribution over randomly initialized models, the authors prove that certain parameter groups have exchangeable marginals and that derived intermediate values $\xi_i$ are exchangeable for any input. Two instantiations—Early Negative Prediction for ReLU and Dominance Prediction for prediction heads—achieve substantial FLOPs reductions on CNNs, GNNs, and LMs while preserving accuracy, using decisions based on partial sums like $\sum_{i=1}^k \xi_i$. ExPrune also composes with static magnitude pruning, providing additional FLOPs reductions on aggressively pruned models, demonstrating broad applicability and practical gains for dynamic inference optimization.

Abstract

Modern neural networks (NN) contain an ever-growing number of parameters, substantially increasing the memory and computational cost of inference. Researchers have explored various ways to reduce the inference cost of NNs by reducing the model size before deployment and dynamically pruning the inference computation at runtime. In this work, we present ExPrune, a general, dynamic pruning optimization that enables multi-granularity partial computation on a per-input basis. ExPrune requires no change to the model architecture or the training algorithm. ExPrune is based on our theoretical results that the relationship between certain model parameters and intermediate values can be described by a statistical property called exchangeability. By identifying exchangeable parameters and values in the model, we are able to first partially evaluate the network, analyze the statistics of the partial results, and make pruning decisions on the fly. Because ExPrune is theory grounded, it generalizes across model architectures in different problem domains. We evaluate ExPrune on one computer vision models, one graph model and one language model. ExPrune provides 10.98--17.33% reduction in FLOPs with negligible accuracy drop and 21.61--27.16% reduction in FLOPs with at most 1% accuracy drop. We also demonstrate that ExPrune composes with static magnitude pruning. On models that have been aggressively statically pruned, ExPrune still provides additional 10.24--11.11% reduction in FLOPs with negligible accuracy drop and 13.91--14.39% reduction in FLOPs with at most 1% accuracy drop.

Exchangeability in Neural Network and its Application to Dynamic Pruning

TL;DR

ExPrune introduces exchangeability as a statistical symmetry between groups of neural network parameters and their intermediate values, enabling dynamic, per-input partial computation without architectural changes. By modeling training as sampling from a distribution over randomly initialized models, the authors prove that certain parameter groups have exchangeable marginals and that derived intermediate values are exchangeable for any input. Two instantiations—Early Negative Prediction for ReLU and Dominance Prediction for prediction heads—achieve substantial FLOPs reductions on CNNs, GNNs, and LMs while preserving accuracy, using decisions based on partial sums like . ExPrune also composes with static magnitude pruning, providing additional FLOPs reductions on aggressively pruned models, demonstrating broad applicability and practical gains for dynamic inference optimization.

Abstract

Modern neural networks (NN) contain an ever-growing number of parameters, substantially increasing the memory and computational cost of inference. Researchers have explored various ways to reduce the inference cost of NNs by reducing the model size before deployment and dynamically pruning the inference computation at runtime. In this work, we present ExPrune, a general, dynamic pruning optimization that enables multi-granularity partial computation on a per-input basis. ExPrune requires no change to the model architecture or the training algorithm. ExPrune is based on our theoretical results that the relationship between certain model parameters and intermediate values can be described by a statistical property called exchangeability. By identifying exchangeable parameters and values in the model, we are able to first partially evaluate the network, analyze the statistics of the partial results, and make pruning decisions on the fly. Because ExPrune is theory grounded, it generalizes across model architectures in different problem domains. We evaluate ExPrune on one computer vision models, one graph model and one language model. ExPrune provides 10.98--17.33% reduction in FLOPs with negligible accuracy drop and 21.61--27.16% reduction in FLOPs with at most 1% accuracy drop. We also demonstrate that ExPrune composes with static magnitude pruning. On models that have been aggressively statically pruned, ExPrune still provides additional 10.24--11.11% reduction in FLOPs with negligible accuracy drop and 13.91--14.39% reduction in FLOPs with at most 1% accuracy drop.

Paper Structure

This paper contains 18 sections, 3 theorems, 2 equations, 5 figures, 1 table.

Key Result

Theorem 1

Let $\zeta{}=\left(\zeta{}_1, \ldots, \zeta{}_n\right) \in \mathcal{X}^n$ be vector of exchangeable random variables. Fix a transformation $G: \mathcal{X}^n \rightarrow\mathcal{X}^n$. If $G$ is permutation equivariant, i.e., $\forall$ permutation matrix $P$ and $\zeta{}_0\in \mathcal{X}^n$, $P G(\ze

Figures (5)

  • Figure 1: Dynamic pruning with ExPrune algorithm, grounded by our theory of exchangeability.
  • Figure 2: MLP without bias. $a,b,c$ are neuron activations. $W'$ and $W$ are weight matrices. Parameters and values with the same color are exchangeable, thus identically distributed.
  • Figure 4: Parameters with the same color have exchangeable distributions in the trained model.
  • Figure 5: ● is StatsTest, $\blacktriangle$ is Threshold, $\blacksquare$ is SnaPEA, $\bigstar$ is the unoptimized baseline. [0.5ex]3mm1pt show fidelty and normalized FLOPs for unoptimized baseline. [0.5ex]3mm1pt shows baseline fidelity minus $1\%$.
  • Figure 6: Fidelity-FLOPs scatter plots for statically pruned VGG11-BN models. FLOPs are normalized to largest model's unoptimized baseline in (c). Colors and lines have the same meaning as in Figure \ref{['fig:main:results']}.

Theorems & Definitions (7)

  • Definition 1: Exchangeability
  • Theorem 1: Exchangeability Preservation
  • Definition 2: Parameter Space Symmetry
  • Theorem 2: Exchangeable Parameters
  • proof
  • Theorem 3: Exchangeable Values
  • proof