COLT: Cyclic Overlapping Lottery Tickets for Faster Pruning of Convolutional Neural Networks
Md. Ismail Hossain, Mohammed Rakib, M. M. Lutfe Elahi, Nabeel Mohammed, Shafin Rahman
TL;DR
COLT introduces Cyclic Overlapping Lottery Tickets, a data-partitioned pruning framework that uses overlapping masks to derive highly sparse subnetworks in fewer pruning rounds while preserving accuracy. By training multiple models on non-overlapping class partitions and intersecting their pruned weights, COLT yields robust, transferable tickets that generalize across datasets and extend to object detection. Across CIFAR-10/100, Tiny ImageNet, ImageNet, and Pascal VOC, COLT achieves comparable performance to LTH at high sparsity but with significantly lower computation time and improved transferability from large to small datasets. This approach reduces training cost and energy while delivering practical, scalable sparse networks for CNNs.
Abstract
Pruning refers to the elimination of trivial weights from neural networks. The sub-networks within an overparameterized model produced after pruning are often called Lottery tickets. This research aims to generate winning lottery tickets from a set of lottery tickets that can achieve similar accuracy to the original unpruned network. We introduce a novel winning ticket called Cyclic Overlapping Lottery Ticket (COLT) by data splitting and cyclic retraining of the pruned network from scratch. We apply a cyclic pruning algorithm that keeps only the overlapping weights of different pruned models trained on different data segments. Our results demonstrate that COLT can achieve similar accuracies (obtained by the unpruned model) while maintaining high sparsities. We show that the accuracy of COLT is on par with the winning tickets of Lottery Ticket Hypothesis (LTH) and, at times, is better. Moreover, COLTs can be generated using fewer iterations than tickets generated by the popular Iterative Magnitude Pruning (IMP) method. In addition, we also notice COLTs generated on large datasets can be transferred to small ones without compromising performance, demonstrating its generalizing capability. We conduct all our experiments on Cifar-10, Cifar-100 & TinyImageNet datasets and report superior performance than the state-of-the-art methods.
