Pruning-based Data Selection and Network Fusion for Efficient Deep Learning
Humaira Kousar, Hasnain Irshad Bhatti, Jaekyun Moon
TL;DR
PruneFuse tackles the high cost of data selection in active learning by using pruning-at-initialization to create a small surrogate network that efficiently identifies informative samples. The method then fuses the trained pruned model with the original dense model to provide a superior initialization, accelerating convergence and improving generalization, with refinement via knowledge distillation. Empirical results on CIFAR-10, CIFAR-100, and Tiny ImageNet-200 show PruneFuse achieves higher final accuracy than baselines while reducing computational overhead, and ablation studies demonstrate the benefits of both fusion and KD across pruning ratios. The approach offers a scalable, practical solution for resource-constrained deep learning pipelines, enabling faster training with less labeling effort.
Abstract
Efficient data selection is essential for improving the training efficiency of deep neural networks and reducing the associated annotation costs. However, traditional methods tend to be computationally expensive, limiting their scalability and real-world applicability. We introduce PruneFuse, a novel method that combines pruning and network fusion to enhance data selection and accelerate network training. In PruneFuse, the original dense network is pruned to generate a smaller surrogate model that efficiently selects the most informative samples from the dataset. Once this iterative data selection selects sufficient samples, the insights learned from the pruned model are seamlessly integrated with the dense model through network fusion, providing an optimized initialization that accelerates training. Extensive experimentation on various datasets demonstrates that PruneFuse significantly reduces computational costs for data selection, achieves better performance than baselines, and accelerates the overall training process.
