Table of Contents
Fetching ...

Adapters Mixup: Mixing Parameter-Efficient Adapters to Enhance the Adversarial Robustness of Fine-tuned Pre-trained Text Classifiers

Tuc Nguyen, Thai Le

TL;DR

AdpMixup introduces a novel fusion of parameter-efficient adapters with Mixup-style adversarial augmentation to enhance the robustness of fine-tuned pre-trained text classifiers. By training separate adapters on clean and adversarial data and dynamically mixing them at inference with a per-sample coefficient β computed from entropy, the method achieves a favorable balance between clean accuracy and resistance to unknown attacks. The framework supports multiple known attacks (m>1) and remains computationally efficient, leveraging adapters rather than full model re-training. Empirical results on five GLUE tasks with BERT and RoBERTa demonstrate superior trade-offs compared to baselines like Adversarial Training, Model Soup, and Adapter Soup, while enabling interpretable profiling of adversarial examples via β analysis. Overall, AdpMixup offers a scalable, modular defense for PEFT-based NLP systems in adversarial environments, with practical runtime benefits and insights into attack attribution.

Abstract

Existing works show that augmenting the training data of pre-trained language models (PLMs) for classification tasks fine-tuned via parameter-efficient fine-tuning methods (PEFT) using both clean and adversarial examples can enhance their robustness under adversarial attacks. However, this adversarial training paradigm often leads to performance degradation on clean inputs and requires frequent re-training on the entire data to account for new, unknown attacks. To overcome these challenges while still harnessing the benefits of adversarial training and the efficiency of PEFT, this work proposes a novel approach, called AdpMixup, that combines two paradigms: (1) fine-tuning through adapters and (2) adversarial augmentation via mixup to dynamically leverage existing knowledge from a set of pre-known attacks for robust inference. Intuitively, AdpMixup fine-tunes PLMs with multiple adapters with both clean and pre-known adversarial examples and intelligently mixes them up in different ratios during prediction. Our experiments show AdpMixup achieves the best trade-off between training efficiency and robustness under both pre-known and unknown attacks, compared to existing baselines on five downstream tasks across six varied black-box attacks and 2 PLMs. All source code will be available.

Adapters Mixup: Mixing Parameter-Efficient Adapters to Enhance the Adversarial Robustness of Fine-tuned Pre-trained Text Classifiers

TL;DR

AdpMixup introduces a novel fusion of parameter-efficient adapters with Mixup-style adversarial augmentation to enhance the robustness of fine-tuned pre-trained text classifiers. By training separate adapters on clean and adversarial data and dynamically mixing them at inference with a per-sample coefficient β computed from entropy, the method achieves a favorable balance between clean accuracy and resistance to unknown attacks. The framework supports multiple known attacks (m>1) and remains computationally efficient, leveraging adapters rather than full model re-training. Empirical results on five GLUE tasks with BERT and RoBERTa demonstrate superior trade-offs compared to baselines like Adversarial Training, Model Soup, and Adapter Soup, while enabling interpretable profiling of adversarial examples via β analysis. Overall, AdpMixup offers a scalable, modular defense for PEFT-based NLP systems in adversarial environments, with practical runtime benefits and insights into attack attribution.

Abstract

Existing works show that augmenting the training data of pre-trained language models (PLMs) for classification tasks fine-tuned via parameter-efficient fine-tuning methods (PEFT) using both clean and adversarial examples can enhance their robustness under adversarial attacks. However, this adversarial training paradigm often leads to performance degradation on clean inputs and requires frequent re-training on the entire data to account for new, unknown attacks. To overcome these challenges while still harnessing the benefits of adversarial training and the efficiency of PEFT, this work proposes a novel approach, called AdpMixup, that combines two paradigms: (1) fine-tuning through adapters and (2) adversarial augmentation via mixup to dynamically leverage existing knowledge from a set of pre-known attacks for robust inference. Intuitively, AdpMixup fine-tunes PLMs with multiple adapters with both clean and pre-known adversarial examples and intelligently mixes them up in different ratios during prediction. Our experiments show AdpMixup achieves the best trade-off between training efficiency and robustness under both pre-known and unknown attacks, compared to existing baselines on five downstream tasks across six varied black-box attacks and 2 PLMs. All source code will be available.
Paper Structure (46 sections, 13 equations, 5 figures, 31 tables)

This paper contains 46 sections, 13 equations, 5 figures, 31 tables.

Figures (5)

  • Figure 1: AdpMixup Framework: Final model $\theta$ is achieved by dynamically mixing the adapter weights across clean and adversarial with different coefficients $\beta_1, \beta_2, \dots$. The dash red lines are the decision boundaries of different fine-tuning models, that when mixed in a certain way can result in robust inference.
  • Figure 2: By choosing the coefficients $\beta$ dynamically, AdpMixup allows us to profile the regions of combination weight. $\theta_0$ represents the pre-trained weight of the language model, while the gray area illustrates all possible combinations between clean and adversarial adapters. The pink area denotes the potential robust combinations of adapter weights.
  • Figure 3: Average coefficient $\beta$ of AdpMixup with $m=1$ pre-known attack during inference on 100 test examples with RoBERTa against different attack methods. The lower the score, the more the adversarial adapter weight contributes to the mixed models. * and ' denote word-based and character-based attacks, respectively. Red rectangles denote attacks of the same type (word or character-based).
  • Figure 4: Average model accuracy (clean and adversarial) across 5 domain tasks under m=1 pre-known attack method at various ratios of clean examples.
  • Figure 5: Trade-off between predictive accuracy (bar) and false negative rate in detecting adversarial examples (line) of AdpMixup with RoBERTa, assuming a conservative 15% ratio of adversarials out of 1K test inputs.