Pruning-based Data Selection and Network Fusion for Efficient Deep Learning

Humaira Kousar; Hasnain Irshad Bhatti; Jaekyun Moon

Pruning-based Data Selection and Network Fusion for Efficient Deep Learning

Humaira Kousar, Hasnain Irshad Bhatti, Jaekyun Moon

TL;DR

PruneFuse tackles the high cost of data selection in active learning by using pruning-at-initialization to create a small surrogate network that efficiently identifies informative samples. The method then fuses the trained pruned model with the original dense model to provide a superior initialization, accelerating convergence and improving generalization, with refinement via knowledge distillation. Empirical results on CIFAR-10, CIFAR-100, and Tiny ImageNet-200 show PruneFuse achieves higher final accuracy than baselines while reducing computational overhead, and ablation studies demonstrate the benefits of both fusion and KD across pruning ratios. The approach offers a scalable, practical solution for resource-constrained deep learning pipelines, enabling faster training with less labeling effort.

Abstract

Efficient data selection is essential for improving the training efficiency of deep neural networks and reducing the associated annotation costs. However, traditional methods tend to be computationally expensive, limiting their scalability and real-world applicability. We introduce PruneFuse, a novel method that combines pruning and network fusion to enhance data selection and accelerate network training. In PruneFuse, the original dense network is pruned to generate a smaller surrogate model that efficiently selects the most informative samples from the dataset. Once this iterative data selection selects sufficient samples, the insights learned from the pruned model are seamlessly integrated with the dense model through network fusion, providing an optimized initialization that accelerates training. Extensive experimentation on various datasets demonstrates that PruneFuse significantly reduces computational costs for data selection, achieves better performance than baselines, and accelerates the overall training process.

Pruning-based Data Selection and Network Fusion for Efficient Deep Learning

TL;DR

Abstract

Paper Structure (19 sections, 2 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 19 sections, 2 equations, 6 figures, 8 tables, 1 algorithm.

Introduction
Background and Related Works
PruneFuse
Pruning at Initialization
Data Selection via Pruned Model
Training of Pruned Model
Fusion with the Original Model
Refinement via Knowledge Distillation
Experiments
Results and Discussions
Conclusion
Acknowledgments
Appendix
Performance Comparison with different Datasets, Selection Metrics, and Architectures
Comparison with SVP
...and 4 more sections

Figures (6)

Figure 1: Overview of the PruneFuse Method: (1) An untrained neural network is initially pruned to form a structured, pruned network $\theta_p$. (2) This pruned network $\theta_p$ queries the dataset to select prime candidates for annotation, similar to active learning techniques. (3) $\theta_p$ is then trained on these labeled samples to form the trained pruned network $\theta_p^*$. (4) The trained pruned network $\theta_p^*$ is fused with the base model $\theta$, resulting in a fused model. (5) The fused model is further trained on a selected subset of the data, incorporating knowledge distillation from $\theta_p^*$.
Figure 2: Evolution of training trajectories. Pruning $\theta$ to $\theta_p$ tailors the loss landscape from \ref{['subfig:theta_landscape']} to \ref{['subfig:thetap_landscape']}, allowing $\theta_p$ to converge on an optimal configuration, denoted as $\theta^*_p$. This model, $\theta^*_p$, is later fused with the original $\theta$, which provides better initialization and offers superior trajectory for $\theta_F$ to follow, as depicted in \ref{['subfig:thetaf_landscape']}.
Figure 3: Computation Comparison of PruneFuse and Baseline (Active Learning): This figure illustrates the total number of FLOPs utilized by PruneFuse, compared to the baseline Active Learning method, for selecting subsets with specific labeling budgets $b=10\%, 30\%, 50\%$. The experiments are conducted on the CIFAR-10 dataset using the ResNet-56 architecture. Subfigures (a), (b), (c), and (d) correspond to different pruning ratios (0.5, 0.6, 0.7, and 0.8, respectively).
Figure 4: Impact of Model Fusion on PruneFuse Performance: This figure compares the accuracy over epochs between fused and non-fused training approaches within the PruneFuse framework, both utilizing subset (with labeling budget $b$) selected by the pruned model. Experiments are conducted using the ResNet-56 on the CIFAR-10. Subfigures (a) and (b) correspond to pruning ratios $p=0.5$ and $0.6,$ respectively.
Figure 5: Comparison of PruneFuse with SVP.Scatter plot shows final accuracy on target model against the model size for different ResNet models on CIFAR-10 dataset with labeling budget $b =$ 50%. (a) shows for the target network ResNet-14, ResNet-14 (with $p=0.5$ and $p=0.6$) and ResNet-8 models are used as data selectors for PruneFuse and SVP, respectively. While in (b), PruneFuse utilizes ResNet20 (i.e. $p=0.5$ and $p=0.6$) and SVP utilizes ResNet-8 models for data selection when the target model is ResNet-20.
...and 1 more figures

Pruning-based Data Selection and Network Fusion for Efficient Deep Learning

TL;DR

Abstract

Pruning-based Data Selection and Network Fusion for Efficient Deep Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)