Table of Contents
Fetching ...

The EarlyBird Gets the WORM: Heuristically Accelerating EarlyBird Convergence

Adithya Vasudev

TL;DR

The paper addresses the inefficiency of identifying lottery-ticket subnetworks during training by building on the Early-Bird framework. It introduces WORM, a gradient-truncation strategy that leverages static unimportant neuron groups by delaying truncation until an elbow in mask-distance is detected, thereby accelerating ticket discovery and improving post-pruning robustness on CNNs. The authors extend the approach to transformers, showing CNN gains but mixed results for transformer pruning, and provide a thorough discussion of limitations and potential extensions. Overall, WORM offers a practical augmentation to prune-while-train workflows with a promising avenue for scalable pruning in CNNs and a foundational step toward transformer-specific acceleration techniques.

Abstract

The Lottery Ticket hypothesis proposes that ideal, sparse subnetworks, called lottery tickets, exist in untrained dense neural networks. The Early Bird hypothesis proposes an efficient algorithm to find these winning lottery tickets in convolutional neural networks, using the novel concept of distance between subnetworks to detect convergence in the subnetworks of a model. However, this approach overlooks unchanging groups of unimportant neurons near the search's end. We proposes WORM, a method that exploits these static groups by truncating their gradients, forcing the model to rely on other neurons. Experiments show WORM achieves faster ticket identification during training on convolutional neural networks, despite the additional computational overhead, when compared to EarlyBird search. Additionally, WORM-pruned models lose less accuracy during pruning and recover accuracy faster, improving the robustness of a given model. Furthermore, WORM is also able to generalize the Early Bird hypothesis reasonably well to larger models, such as transformers, displaying its flexibility to adapt to more complex architectures.

The EarlyBird Gets the WORM: Heuristically Accelerating EarlyBird Convergence

TL;DR

The paper addresses the inefficiency of identifying lottery-ticket subnetworks during training by building on the Early-Bird framework. It introduces WORM, a gradient-truncation strategy that leverages static unimportant neuron groups by delaying truncation until an elbow in mask-distance is detected, thereby accelerating ticket discovery and improving post-pruning robustness on CNNs. The authors extend the approach to transformers, showing CNN gains but mixed results for transformer pruning, and provide a thorough discussion of limitations and potential extensions. Overall, WORM offers a practical augmentation to prune-while-train workflows with a promising avenue for scalable pruning in CNNs and a foundational step toward transformer-specific acceleration techniques.

Abstract

The Lottery Ticket hypothesis proposes that ideal, sparse subnetworks, called lottery tickets, exist in untrained dense neural networks. The Early Bird hypothesis proposes an efficient algorithm to find these winning lottery tickets in convolutional neural networks, using the novel concept of distance between subnetworks to detect convergence in the subnetworks of a model. However, this approach overlooks unchanging groups of unimportant neurons near the search's end. We proposes WORM, a method that exploits these static groups by truncating their gradients, forcing the model to rely on other neurons. Experiments show WORM achieves faster ticket identification during training on convolutional neural networks, despite the additional computational overhead, when compared to EarlyBird search. Additionally, WORM-pruned models lose less accuracy during pruning and recover accuracy faster, improving the robustness of a given model. Furthermore, WORM is also able to generalize the Early Bird hypothesis reasonably well to larger models, such as transformers, displaying its flexibility to adapt to more complex architectures.
Paper Structure (15 sections, 1 equation, 2 figures, 6 tables, 1 algorithm)