The Trade-off between Performance, Efficiency, and Fairness in Adapter Modules for Text Classification
Minh Duc Bui, Katharina von der Wense
TL;DR
This paper tackles the multi-dimensional evaluation of adapter modules for text classification, addressing performance, efficiency, and fairness. It empirically compares full fine-tuning against adapters (Adapters and LoRA) across three datasets (Jigsaw, HateXplain, BIOS) and four base LMs, finding that adapters largely match full finetuning in accuracy while substantially reducing training time. However, fairness effects are mixed and highly dependent on the baseline model's bias, with potential for bias amplification in high-bias scenarios. The work advocates case-by-case fairness assessment and highlights limitations such as scope to text classification and model selection, emphasizing practical implications for deploying adapter-based methods in trustworthy NLP.
Abstract
Current natural language processing (NLP) research tends to focus on only one or, less frequently, two dimensions - e.g., performance, privacy, fairness, or efficiency - at a time, which may lead to suboptimal conclusions and often overlooking the broader goal of achieving trustworthy NLP. Work on adapter modules (Houlsby et al., 2019; Hu et al., 2021) focuses on improving performance and efficiency, with no investigation of unintended consequences on other aspects such as fairness. To address this gap, we conduct experiments on three text classification datasets by either (1) finetuning all parameters or (2) using adapter modules. Regarding performance and efficiency, we confirm prior findings that the accuracy of adapter-enhanced models is roughly on par with that of fully finetuned models, while training time is substantially reduced. Regarding fairness, we show that adapter modules result in mixed fairness across sensitive groups. Further investigation reveals that, when the standard fine-tuned model exhibits limited biases, adapter modules typically do not introduce extra bias. On the other hand, when the finetuned model exhibits increased bias, the impact of adapter modules on bias becomes more unpredictable, introducing the risk of significantly magnifying these biases for certain groups. Our findings highlight the need for a case-by-case evaluation rather than a one-size-fits-all judgment.
