Table of Contents
Fetching ...

HyperLoader: Integrating Hypernetwork-Based LoRA and Adapter Layers into Multi-Task Transformers for Sequence Labelling

Jesus-German Ortiz-Barajas, Helena Gomez-Adorno, Thamar Solorio

TL;DR

HyperLoader introduces a hypernetwork-based approach to generate task-, layer-, and position-specific weights for adapters, LoRA, and layer normalization to improve multi-task sequence labeling with parameter efficiency. Built on a T5 encoder-decoder backbone and employing the SentT' format to convert sequence labeling into Seq2Seq, it outperforms prior multi-task and hypernetwork-based methods across seven datasets in both full-data and low-resource settings. Key contributions include the integration of task-conditioned adapters and LoRA with multiple hypernetworks, evidence that gains arise from weight generation rather than parameter count alone, and ablations validating the effectiveness of combining several parameter-efficient methods. The work has practical impact for deploying compact, scalable multi-task sequence labelling systems in real-world applications with varying data availability.

Abstract

We present HyperLoader, a simple approach that combines different parameter-efficient fine-tuning methods in a multi-task setting. To achieve this goal, our model uses a hypernetwork to generate the weights of these modules based on the task, the transformer layer, and its position within this layer. Our method combines the benefits of multi-task learning by capturing the structure of all tasks while reducing the task interference problem by encapsulating the task-specific knowledge in the generated weights and the benefits of combining different parameter-efficient methods to outperform full-fine tuning. We provide empirical evidence that HyperLoader outperforms previous approaches in most datasets and obtains the best average performance across tasks in high-resource and low-resource scenarios.

HyperLoader: Integrating Hypernetwork-Based LoRA and Adapter Layers into Multi-Task Transformers for Sequence Labelling

TL;DR

HyperLoader introduces a hypernetwork-based approach to generate task-, layer-, and position-specific weights for adapters, LoRA, and layer normalization to improve multi-task sequence labeling with parameter efficiency. Built on a T5 encoder-decoder backbone and employing the SentT' format to convert sequence labeling into Seq2Seq, it outperforms prior multi-task and hypernetwork-based methods across seven datasets in both full-data and low-resource settings. Key contributions include the integration of task-conditioned adapters and LoRA with multiple hypernetworks, evidence that gains arise from weight generation rather than parameter count alone, and ablations validating the effectiveness of combining several parameter-efficient methods. The work has practical impact for deploying compact, scalable multi-task sequence labelling systems in real-world applications with varying data availability.

Abstract

We present HyperLoader, a simple approach that combines different parameter-efficient fine-tuning methods in a multi-task setting. To achieve this goal, our model uses a hypernetwork to generate the weights of these modules based on the task, the transformer layer, and its position within this layer. Our method combines the benefits of multi-task learning by capturing the structure of all tasks while reducing the task interference problem by encapsulating the task-specific knowledge in the generated weights and the benefits of combining different parameter-efficient methods to outperform full-fine tuning. We provide empirical evidence that HyperLoader outperforms previous approaches in most datasets and obtains the best average performance across tasks in high-resource and low-resource scenarios.
Paper Structure (19 sections, 5 equations, 2 figures, 5 tables)

This paper contains 19 sections, 5 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Diagram of the HyperLoader model. The Adapter hypernetworks $h^{l}_{A_D}$ and $h^{l}_{A_U}$ produce the weights $D^{l}_{\tau}$ and $U^{l}_{\tau}$ for task-specific adapter modules. The LoRA hypernetworks $h^{l}_{LoRA_{A}}$ and $h^{l}_{LoRA_{B}}$ generate the $A$ and $B$ matrices for task-specific LoRA modules. Finally, the hypernetwork $h^{l}_{LN}$ creates the conditional layer normalization parameters $\beta_{\tau}$ and $\gamma_{\tau}$.
  • Figure 2: Average performance-percentage of trainable parameters plot using different portions of the datasets. $\clubsuit$ indicates a single-task fine-tuning approach.