Table of Contents
Fetching ...

Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy for Language Models

Jianwei Li, Qi Lei, Wei Cheng, Dongkuan Xu

TL;DR

The paper tackles robust pruning of language models by treating robustness as a function of retained pre-trained knowledge. It introduces a post-training, layer-wise pruning framework that preserves embedding and feature spaces, augmented by weight averaging to build a robust dense initialization and adaptive Hessian-based updates for sparse pruning. Empirical results on SST2, AGNews, and IMDB with BERT-base and BERT-large show improved robustness (Aua) and attack-resilience (Asr) at high sparsity, with modest trade-offs in clean accuracy and without retraining. The approach offers a practical path to deploy robust, sparse NLP models while highlighting calibration data and computational considerations as future focus areas.

Abstract

The pruning objective has recently extended beyond accuracy and sparsity to robustness in language models. Despite this, existing methods struggle to enhance robustness against adversarial attacks when continually increasing model sparsity and require a retraining process. As humans step into the era of large language models, these issues become increasingly prominent. This paper proposes that the robustness of language models is proportional to the extent of pre-trained knowledge they encompass. Accordingly, we introduce a post-training pruning strategy designed to faithfully replicate the embedding space and feature space of dense language models, aiming to conserve more pre-trained knowledge during the pruning process. In this setup, each layer's reconstruction error not only originates from itself but also includes cumulative error from preceding layers, followed by an adaptive rectification. Compared to other state-of-art baselines, our approach demonstrates a superior balance between accuracy, sparsity, robustness, and pruning cost with BERT on datasets SST2, IMDB, and AGNews, marking a significant stride towards robust pruning in language models.

Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy for Language Models

TL;DR

The paper tackles robust pruning of language models by treating robustness as a function of retained pre-trained knowledge. It introduces a post-training, layer-wise pruning framework that preserves embedding and feature spaces, augmented by weight averaging to build a robust dense initialization and adaptive Hessian-based updates for sparse pruning. Empirical results on SST2, AGNews, and IMDB with BERT-base and BERT-large show improved robustness (Aua) and attack-resilience (Asr) at high sparsity, with modest trade-offs in clean accuracy and without retraining. The approach offers a practical path to deploy robust, sparse NLP models while highlighting calibration data and computational considerations as future focus areas.

Abstract

The pruning objective has recently extended beyond accuracy and sparsity to robustness in language models. Despite this, existing methods struggle to enhance robustness against adversarial attacks when continually increasing model sparsity and require a retraining process. As humans step into the era of large language models, these issues become increasingly prominent. This paper proposes that the robustness of language models is proportional to the extent of pre-trained knowledge they encompass. Accordingly, we introduce a post-training pruning strategy designed to faithfully replicate the embedding space and feature space of dense language models, aiming to conserve more pre-trained knowledge during the pruning process. In this setup, each layer's reconstruction error not only originates from itself but also includes cumulative error from preceding layers, followed by an adaptive rectification. Compared to other state-of-art baselines, our approach demonstrates a superior balance between accuracy, sparsity, robustness, and pruning cost with BERT on datasets SST2, IMDB, and AGNews, marking a significant stride towards robust pruning in language models.
Paper Structure (45 sections, 6 equations, 5 figures, 8 tables, 3 algorithms)

This paper contains 45 sections, 6 equations, 5 figures, 8 tables, 3 algorithms.

Figures (5)

  • Figure 1: Architecture of Main Strategy. A: First, we generate a robust and dense language model in two steps: we fine-tune the pre-trained weight with various hyperparameters and settings, resulting in multiple models with different knowledge; we then employ a greedy algorithm to only average the weights of models that contribute to the final performance. B: Second, we apply our adaptive pruning method to generate robust and sparse language models in a layer-wise setting. Specifically, we optimize the original independent pruning process of each layer to an adaptive way. This requires subsequent layers to update the Hessian Matrix and the optimal dense weight according to the sparse outputs of preceding layers, thereby inheriting and correcting the accumulated error together.
  • Figure 2: Attention Score Visualisation in BERT$_{base}$. We have selected an adversarial sample ("it's a bewitching and often repercussions journey.") from SST2 and visualized the attention scores in the robust and dense model (\ref{['attention-b']}, \ref{['attention-e']}), the sparse language model generated with IMP+FreeLB (\ref{['attention-a']}, \ref{['attention-d']}), and the sparse language model created using our method (\ref{['attention-c']}, \ref{['attention-f']}). Here, Figures \ref{['attention-a']}, \ref{['attention-b']}, and \ref{['attention-c']} depict the attention scores from the first transformer block of BERT$_{Base}$, while Figures \ref{['attention-d']}, \ref{['attention-e']},and \ref{['attention-f']} show scores from the last transformer block. Evidently, the attention scores produced by our method align more closely with those from the robust and dense model.
  • Figure 3: Impact of # of Calibration Data from SST2.
  • Figure 4: Impact of Sparsity Levels on SST2
  • Figure 5: Visualization of Sentence Embeddings. We've compared the distance of sentence embeddings between the robust and dense model (red), the sparse language models generated with IMP+FreeLB (green), and the sparse language models created using our method (blue). Figure \ref{['distance-a']} displays the two-dimensional representation of the embeddings from different layers of various models for sentence i ("allows us to hope that nolan is prepped to embark on a major career as a commercial yet shrewd scriptwriter"). Similarly, Figure \ref{['distance-b']} showcases the two-dimensional representation of the embeddings from different layers of various models for sentence ii ("allows us to hope that nolan is poised to embark on a major career as a commercial yet inventive filmmaker"). Note that sentence i originates from SST2 dataset, and all three models accurately predict its label. On the other hand, sentence ii, an adversarial sample generated from sentence i, is only correctly predicted by the robust and dense model and our sparse language model. We use the embedding of the first token ([CLS]) as the representation of sentences, as the model uses this for the final classification. Clearly, our method can generate embeddings and features that align more closely with the robust and dense model under adversarial attacks.