Explore and Establish Synergistic Effects Between Weight Pruning and Coreset Selection in Neural Network Training

Weilin Wan; Fan Yi; Weizhong Zhang; Quan Zhou; Cheng Jin

Explore and Establish Synergistic Effects Between Weight Pruning and Coreset Selection in Neural Network Training

Weilin Wan, Fan Yi, Weizhong Zhang, Quan Zhou, Cheng Jin

TL;DR

This work tackles the computational burden of training deep neural networks by studying the interaction between weight pruning and coreset selection. It proposes SWaST, a joint optimization framework that alternates pruning and subset selection and adds a state-preservation constraint to stabilize training, avoiding the double-loss instability. Experiments across standard benchmarks show strong pruning-coreset synergy with up to 17.83% accuracy gains and substantial FLOP reductions, along with improved noise robustness. The approach offers practical benefits for efficient, robust deep learning on resource-constrained platforms.

Abstract

Modern deep neural networks rely heavily on massive model weights and training samples, incurring substantial computational costs. Weight pruning and coreset selection are two emerging paradigms proposed to improve computational efficiency. In this paper, we first explore the interplay between redundant weights and training samples through a transparent analysis: redundant samples, particularly noisy ones, cause model weights to become unnecessarily overtuned to fit them, complicating the identification of irrelevant weights during pruning; conversely, irrelevant weights tend to overfit noisy data, undermining coreset selection effectiveness. To further investigate and harness this interplay in deep learning, we develop a Simultaneous Weight and Sample Tailoring mechanism (SWaST) that alternately performs weight pruning and coreset selection to establish a synergistic effect in training. During this investigation, we observe that when simultaneously removing a large number of weights and samples, a phenomenon we term critical double-loss can occur, where important weights and their supportive samples are mistakenly eliminated at the same time, leading to model instability and nearly irreversible degradation that cannot be recovered in subsequent training. Unlike classic machine learning models, this issue can arise in deep learning due to the lack of theoretical guarantees on the correctness of weight pruning and coreset selection, which explains why these paradigms are often developed independently. We mitigate this by integrating a state preservation mechanism into SWaST, enabling stable joint optimization. Extensive experiments reveal a strong synergy between pruning and coreset selection across varying prune rates and coreset sizes, delivering accuracy boosts of up to 17.83% alongside 10% to 90% FLOPs reductions.

Explore and Establish Synergistic Effects Between Weight Pruning and Coreset Selection in Neural Network Training

TL;DR

Abstract

Explore and Establish Synergistic Effects Between Weight Pruning and Coreset Selection in Neural Network Training

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)

Theorems & Definitions (3)