Generalizability of Mixture of Domain-Specific Adapters from the Lens of Signed Weight Directions and its Application to Effective Model Pruning
Tuc Nguyen, Thai Le
TL;DR
The paper investigates whether weight-space mixing of domain-specific adapters generalizes in in-domain settings, a question not thoroughly explored in prior work. It conducts a large-scale, in-domain evaluation across 13 diverse datasets using multiple adapter methods and exhaustively enumerates adapter mixtures to quantify generalization and adversarial robustness. A central finding is a robust negative correlation between the fraction of weight sign differences (FSD) among mixed adapters and predictive performance, which motivates FSD-guided strategies, including Greedy Adapter Mixing and FSD-based magnitude pruning that maintain performance at high sparsity. The results yield practical guidance for deploying parameter-efficient adapters in real-world scenarios and suggest pruning as a natural by-product to reduce sign-conflicts while preserving accuracy.
Abstract
Several parameter-efficient fine-tuning methods based on adapters have been proposed as a streamlined approach to incorporate not only a single specialized knowledge into existing Pre-Trained Language Models (PLMs) but also multiple of them at once. Recent works such as AdapterSoup propose to mix not all but only a selective sub-set of domain-specific adapters during inference via model weight averaging to optimize performance on novel, unseen domains with excellent computational efficiency. However, the essential generalizability of this emerging weight-space adapter mixing mechanism on \textit{unseen, in-domain examples} remains unexplored. Thus, in this study, we conduct a comprehensive analysis to elucidate the generalizability of domain-specific adapter mixtures in in-domain evaluation. We also provide investigations into the inner workings of the mixture of domain-specific adapters by analyzing their weight signs, yielding critical analysis on the negative correlation between their fraction of weight sign difference and their mixtures' generalizability.
