Table of Contents
Fetching ...

Budgeted Broadcast: An Activity-Dependent Pruning Rule for Neural Network Efficiency

Yaron Meirovitch, Fuming Yang, Jeff Lichtman, Nir Shavit

TL;DR

Budgeted Broadcast introduces a local traffic budget $t_i=a_i k_i$ to prune neural networks, aiming to maximize information coding under a global resource constraint. From constrained-entropy optimization, the network converges to a selectivity–audience balance described by $\log\frac{1-a_i}{a_i}=\beta k_i$, implemented with SP-in/SP-out masks and EMA-based activity tracking. Across controlled didactic tasks and four real-domain benchmarks (ASR, face identification, change detection, and EM synapse segmentation), BB improves tail/rare-event metrics and decorrelation while matching or exceeding dense baselines at the same sparsity. The approach offers a biologically grounded, easy-to-integrate pruning mechanism with potential to foster more diverse and efficient representations in large-scale models.

Abstract

Most pruning methods remove parameters ranked by impact on loss (e.g., magnitude or gradient). We propose Budgeted Broadcast (BB), which gives each unit a local traffic budget (the product of its long-term on-rate $a_i$ and fan-out $k_i$). A constrained-entropy analysis shows that maximizing coding entropy under a global traffic budget yields a selectivity-audience balance, $\log\frac{1-a_i}{a_i}=βk_i$. BB enforces this balance with simple local actuators that prune either fan-in (to lower activity) or fan-out (to reduce broadcast). In practice, BB increases coding entropy and decorrelation and improves accuracy at matched sparsity across Transformers for ASR, ResNets for face identification, and 3D U-Nets for synapse prediction, sometimes exceeding dense baselines. On electron microscopy images, it attains state-of-the-art F1 and PR-AUC under our evaluation protocol. BB is easy to integrate and suggests a path toward learning more diverse and efficient representations.

Budgeted Broadcast: An Activity-Dependent Pruning Rule for Neural Network Efficiency

TL;DR

Budgeted Broadcast introduces a local traffic budget to prune neural networks, aiming to maximize information coding under a global resource constraint. From constrained-entropy optimization, the network converges to a selectivity–audience balance described by , implemented with SP-in/SP-out masks and EMA-based activity tracking. Across controlled didactic tasks and four real-domain benchmarks (ASR, face identification, change detection, and EM synapse segmentation), BB improves tail/rare-event metrics and decorrelation while matching or exceeding dense baselines at the same sparsity. The approach offers a biologically grounded, easy-to-integrate pruning mechanism with potential to foster more diverse and efficient representations in large-scale models.

Abstract

Most pruning methods remove parameters ranked by impact on loss (e.g., magnitude or gradient). We propose Budgeted Broadcast (BB), which gives each unit a local traffic budget (the product of its long-term on-rate and fan-out ). A constrained-entropy analysis shows that maximizing coding entropy under a global traffic budget yields a selectivity-audience balance, . BB enforces this balance with simple local actuators that prune either fan-in (to lower activity) or fan-out (to reduce broadcast). In practice, BB increases coding entropy and decorrelation and improves accuracy at matched sparsity across Transformers for ASR, ResNets for face identification, and 3D U-Nets for synapse prediction, sometimes exceeding dense baselines. On electron microscopy images, it attains state-of-the-art F1 and PR-AUC under our evaluation protocol. BB is easy to integrate and suggests a path toward learning more diverse and efficient representations.

Paper Structure

This paper contains 16 sections, 4 equations, 8 figures, 1 table, 1 algorithm.

Figures (8)

  • Figure 1: The conceptual framework of Budgeted Broadcast, from biology to a predictive theory.(Left) Our method models a neuron's metabolic cost as traffic, $t_i = a_i k_i$ (long-term activity $\times$ fan-out). If traffic exceeds a budget $\tau$, connections are pruned. This can be achieved by reducing fan-out (axonal pruning) or reducing fan-in to lower activity (dendritic pruning). (Top Right) This rule is inspired by Henneman's size principle Henneman1957SizePrincipleHenneman1965SizePrinciple, where large motor neurons (large size, analogous to fan-out $k_i$) have lower average activity levels ($a_i$). (Bottom Right) Our resource-preservation rule predicts a linear relationship between a unit's fan-out ($k_i$) and its inactivity log-odds ($\log\frac{a_i}{1-a_i}$), which we term the selectivity-audience balance.
  • Figure 2: SP-out (Axonal pruning). Activation-aware fan-out pruning that masks a hidden unit's outgoing connections to the next layer, enforcing the per-unit traffic budget $t=a\,k$ against a metabolic threshold $\tau$. High-activity units (large $a$) shed more outgoing edges; low-activity units keep more. Right: the learned binary mask sparsifies the dense hidden$\to L{+}1$ matrix according to $k=d_0+\tfrac{1}{\beta}\log\!\frac{1-a}{a}$, clipped to $[1,N_{\text{out}}]$. SP-in performs the complementary, opposite operation (fan-in pruning); see Appendix.
  • Figure 3: The selectivity–audience balance emerges under budget pressure on controlled XOR tasks. The balance is a direct consequence of budget-driven structural adaptation, not an artifact of gradient-based training. Left panel: In networks trained with Budgeted Broadcast, a robust linear relationship emerges between unit fan-out ($k_i$) and inactivity log-odds, confirming our theoretical prediction. Middle panel: A one-shot traffic-threshold variant that prunes when $t_i = a_i k_i > \tau$ produces a similar trend but with a wider variability band and mild curvature, consistent with the threshold gate being a local approximation to the KKT stationary law $\log\frac{1-a_i}{a_i} = \beta k_i$. Right panel: In control networks trained with SGD alone, fan-out remains constant at the initialization value (64), eliminating any correlation with activity (see Sec. \ref{['subsubsec:xor_balance']})
  • Figure 4: BB's core properties validated on controlled DNF tasks. These experiments confirm the mechanism, safety, and optimization benefits of the BB principle. (a) BB inherently protects rare features (green line), whose traffic remains safely below the budget $\tau$, while actively pruning over-active common features (red line). (b) BB consistently solves a DNF task designed to make standard SGD fail, overcoming a lazy-learning barrier. (c) The number of cycles for BB to solve the DNF task follows a predictable $O(W \log W)$ scaling law. All setup details are in Appendix.
  • Figure 5: ASR on LibriSpeech. (a) Overall Word Error Rate Reduction (WERR) test_clean; (b) Bucketed $\Delta$Word Error Rate (WER) test_clean (Head/Mid/Tail fixed at 20/70/10; buckets are fixed across methods); (c) Overall WERR test_other; (d) Bucketed $\Delta$WER test_other. Shaded bands/bars are mean $\pm$ std over seeds; dashed line is Dense (WERR / $\Delta$WER$=0$).
  • ...and 3 more figures