Table of Contents
Fetching ...

PruneFuse: Efficient Data Selection via Weight Pruning and Network Fusion

Humaira Kousar, Hasnain Irshad Bhatti, Jaekyun Moon

Abstract

Efficient data selection is crucial for enhancing the training efficiency of deep neural networks and minimizing annotation requirements. Traditional methods often face high computational costs, limiting their scalability and practical use. We introduce PruneFuse, a novel strategy that leverages pruned networks for data selection and later fuses them with the original network to optimize training. PruneFuse operates in two stages: First, it applies structured pruning to create a smaller pruned network that, due to its structural coherence with the original network, is well-suited for the data selection task. This small network is then trained and selects the most informative samples from the dataset. Second, the trained pruned network is seamlessly fused with the original network. This integration leverages the insights gained during the training of the pruned network to facilitate the learning process of the fused network while leaving room for the network to discover more robust solutions. Extensive experimentation on various datasets demonstrates that PruneFuse significantly reduces computational costs for data selection, achieves better performance than baselines, and accelerates the overall training process.

PruneFuse: Efficient Data Selection via Weight Pruning and Network Fusion

Abstract

Efficient data selection is crucial for enhancing the training efficiency of deep neural networks and minimizing annotation requirements. Traditional methods often face high computational costs, limiting their scalability and practical use. We introduce PruneFuse, a novel strategy that leverages pruned networks for data selection and later fuses them with the original network to optimize training. PruneFuse operates in two stages: First, it applies structured pruning to create a smaller pruned network that, due to its structural coherence with the original network, is well-suited for the data selection task. This small network is then trained and selects the most informative samples from the dataset. Second, the trained pruned network is seamlessly fused with the original network. This integration leverages the insights gained during the training of the pruned network to facilitate the learning process of the fused network while leaving room for the network to discover more robust solutions. Extensive experimentation on various datasets demonstrates that PruneFuse significantly reduces computational costs for data selection, achieves better performance than baselines, and accelerates the overall training process.

Paper Structure

This paper contains 37 sections, 2 theorems, 24 equations, 13 figures, 31 tables, 1 algorithm.

Key Result

Theorem 5.1

Under the Assumptions stated in Sec. sec:ErrorBound of Supplementary Materials, with probability at least $1-\eta$,

Figures (13)

  • Figure 1: Overview of the PruneFuse method: (1) An untrained neural network is initially pruned to form a structured, pruned network $\theta_p$. (2) This pruned network $\theta_p$ queries the dataset to select prime candidates for annotation, similar to active learning techniques. (3) $\theta_p$ is then trained on these labeled samples to form the trained pruned network $\theta_p^*$. (4) The trained pruned network $\theta_p^*$ is fused with the base model $\theta$, resulting in a fused model. (5) The fused model is further trained on a selected subset of the data, incorporating knowledge distillation from $\theta_p^*$. At regular intervals $T_{\text{sync}}$, the fused model is utilized to dynamically update the pruned model for subsequent data selection.
  • Figure 1: Components of PruneFuse.
  • Figure 2: Evolution of training trajectories. Conceptual illustration of how Pruning $\theta$ to $\theta_p$ tailors the loss landscape from \ref{['subfig:theta_landscape']} to \ref{['subfig:thetap_landscape']}, allowing $\theta_p$ to converge on an effective configuration, denoted as $\theta^*_p$. This model, $\theta^*_p$, is later fused with the original $\theta$, which provides a better initialization and leads to an improved trajectory for $\theta_F$ to follow, as depicted in \ref{['subfig:thetaf_landscape']}.
  • Figure 3: Accuracy-cost trade-off for PruneFuse. This figure illustrates the total number of FLOPs utilized by PruneFuse for data selection, compared to the baseline Active Learning method, for $T_{\text{sync}}{=}0,1$ with labeling budgets $b=10\%, 30\%, 50\%$. The experiments are conducted on the CIFAR-10 dataset using the ResNet-56 architecture. Subfigures (a), (b), (c), and (d) correspond to different pruning ratios of 0.5, 0.6, 0.7, and 0.8, respectively.
  • Figure 4: Impact of Model Fusion on PruneFuse performance: This figure compares the accuracy over epochs for different training variants within the PruneFuse framework on CIFAR-10 with ResNet-56. We compare fusion only, knowledge distillation (KD) only, fusion with KD, and training without fusion and KD. Subfigures (a), (b), and (c) correspond to $p=0.5$, $0.6$, and $0.7$, respectively, for $b=30\%$.
  • ...and 8 more figures

Theorems & Definitions (2)

  • Theorem 5.1
  • Theorem 9.1