Table of Contents
Fetching ...

MerA: Merging Pretrained Adapters For Few-Shot Learning

Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, Dacheng Tao

TL;DR

The document provides official guidelines for preparing ACL 2023 submissions using LaTeX templates, detailing the recommended toolchain (including pdfLaTeX and XeLaTeX), citation practices with natbib, and structured formatting for the document body. It covers how to manage footnotes, tables, figures, hyperlinks, and references, as well as how to integrate BibTeX files and produce properly linked DOIs/URLs. It also specifies the necessity of a Limitations section and an Ethics Policy, aiming to ensure consistent formatting, accessibility, and compliance across ACL submissions. Collectively, these guidelines streamline submission formatting and ensure alignment with ACL's standards for scholarly communication.

Abstract

Adapter tuning, which updates only a few parameters, has become a mainstream method for fine-tuning pretrained language models to downstream tasks. However, it often yields subpar results in few-shot learning. AdapterFusion, which assembles pretrained adapters using composition layers tailored to specific tasks, is a possible solution but significantly increases trainable parameters and deployment costs. Despite this, our preliminary study reveals that even single adapters can outperform Adapterfusion in few-shot learning, urging us to propose \textbf{\texttt{Merging Pretrained Adapters}} (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion. Extensive experiments on two PLMs demonstrate that MerA achieves substantial improvements compared to both single adapters and AdapterFusion. To further enhance the capacity of MerA, we also introduce a simple yet effective technique, referred to as the "\textit{same-track}" setting, that merges adapters from the same track of pretraining tasks. With the implementation of the "\textit{same-track}" setting, we observe even more impressive gains, surpassing the performance of both full fine-tuning and adapter tuning by a substantial margin, e.g., 3.5\% in MRPC and 5.0\% in MNLI.

MerA: Merging Pretrained Adapters For Few-Shot Learning

TL;DR

The document provides official guidelines for preparing ACL 2023 submissions using LaTeX templates, detailing the recommended toolchain (including pdfLaTeX and XeLaTeX), citation practices with natbib, and structured formatting for the document body. It covers how to manage footnotes, tables, figures, hyperlinks, and references, as well as how to integrate BibTeX files and produce properly linked DOIs/URLs. It also specifies the necessity of a Limitations section and an Ethics Policy, aiming to ensure consistent formatting, accessibility, and compliance across ACL submissions. Collectively, these guidelines streamline submission formatting and ensure alignment with ACL's standards for scholarly communication.

Abstract

Adapter tuning, which updates only a few parameters, has become a mainstream method for fine-tuning pretrained language models to downstream tasks. However, it often yields subpar results in few-shot learning. AdapterFusion, which assembles pretrained adapters using composition layers tailored to specific tasks, is a possible solution but significantly increases trainable parameters and deployment costs. Despite this, our preliminary study reveals that even single adapters can outperform Adapterfusion in few-shot learning, urging us to propose \textbf{\texttt{Merging Pretrained Adapters}} (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion. Extensive experiments on two PLMs demonstrate that MerA achieves substantial improvements compared to both single adapters and AdapterFusion. To further enhance the capacity of MerA, we also introduce a simple yet effective technique, referred to as the "\textit{same-track}" setting, that merges adapters from the same track of pretraining tasks. With the implementation of the "\textit{same-track}" setting, we observe even more impressive gains, surpassing the performance of both full fine-tuning and adapter tuning by a substantial margin, e.g., 3.5\% in MRPC and 5.0\% in MNLI.
Paper Structure (12 sections, 2 tables)