Table of Contents
Fetching ...

From Instance Training to Instruction Learning: Task Adapters Generation from Instructions

Huanxuan Liao, Shizhu He, Yao Xu, Yuanzhe Zhang, Yanchao Hao, Shengping Liu, Kang Liu, Jun Zhao

TL;DR

This work tackles the limited cross-task generalization of instruction-finetuned LLMs by replacing instance-heavy training with instruction-guided adapter generation. It introduces TAGI, a two-stage framework where a hypernetwork converts task instructions into LoRA adapters, which are then aligned with a teacher model via knowledge distillation and parameter alignment. TAGI demonstrates competitive or superior results on SNI and P3 while drastically reducing inference costs, illustrating strong cross-task generalization without per-task retraining. The approach broadens practical applicability by enabling efficient adaptation to unseen tasks through instruction-driven model construction. The combination of instruction fusion, pretraining, and distillation contributes to robust generalization with lower compute than traditional meta-training.

Abstract

Large language models (LLMs) have acquired the ability to solve general tasks by utilizing instruction finetuning (IFT). However, IFT still relies heavily on instance training of extensive task data, which greatly limits the adaptability of LLMs to real-world scenarios where labeled task instances are scarce and broader task generalization becomes paramount. Contrary to LLMs, humans acquire skills and complete tasks not merely through repeated practice but also by understanding and following instructional guidelines. This paper is dedicated to simulating human learning to address the shortcomings of instance training, focusing on instruction learning to enhance cross-task generalization. Within this context, we introduce Task Adapters Generation from Instructions (TAGI), which automatically constructs the task-specific model in a parameter generation manner based on the given task instructions without retraining for unseen tasks. Specifically, we utilize knowledge distillation to enhance the consistency between TAGI developed through Learning with Instruction and task-specific models developed through Training with Instance, by aligning the labels, output logits, and adapter parameters between them. TAGI is endowed with cross-task generalization capabilities through a two-stage training process that includes hypernetwork pretraining and finetuning. We evaluate TAGI on the Super-Natural Instructions and P3 datasets. The experimental results demonstrate that TAGI can match or even outperform traditional meta-trained models and other hypernetwork models, while significantly reducing computational requirements.

From Instance Training to Instruction Learning: Task Adapters Generation from Instructions

TL;DR

This work tackles the limited cross-task generalization of instruction-finetuned LLMs by replacing instance-heavy training with instruction-guided adapter generation. It introduces TAGI, a two-stage framework where a hypernetwork converts task instructions into LoRA adapters, which are then aligned with a teacher model via knowledge distillation and parameter alignment. TAGI demonstrates competitive or superior results on SNI and P3 while drastically reducing inference costs, illustrating strong cross-task generalization without per-task retraining. The approach broadens practical applicability by enabling efficient adaptation to unseen tasks through instruction-driven model construction. The combination of instruction fusion, pretraining, and distillation contributes to robust generalization with lower compute than traditional meta-training.

Abstract

Large language models (LLMs) have acquired the ability to solve general tasks by utilizing instruction finetuning (IFT). However, IFT still relies heavily on instance training of extensive task data, which greatly limits the adaptability of LLMs to real-world scenarios where labeled task instances are scarce and broader task generalization becomes paramount. Contrary to LLMs, humans acquire skills and complete tasks not merely through repeated practice but also by understanding and following instructional guidelines. This paper is dedicated to simulating human learning to address the shortcomings of instance training, focusing on instruction learning to enhance cross-task generalization. Within this context, we introduce Task Adapters Generation from Instructions (TAGI), which automatically constructs the task-specific model in a parameter generation manner based on the given task instructions without retraining for unseen tasks. Specifically, we utilize knowledge distillation to enhance the consistency between TAGI developed through Learning with Instruction and task-specific models developed through Training with Instance, by aligning the labels, output logits, and adapter parameters between them. TAGI is endowed with cross-task generalization capabilities through a two-stage training process that includes hypernetwork pretraining and finetuning. We evaluate TAGI on the Super-Natural Instructions and P3 datasets. The experimental results demonstrate that TAGI can match or even outperform traditional meta-trained models and other hypernetwork models, while significantly reducing computational requirements.
Paper Structure (36 sections, 6 equations, 6 figures, 12 tables)

This paper contains 36 sections, 6 equations, 6 figures, 12 tables.

Figures (6)

  • Figure 1: Comparison of the typical Training with Instance and the proposed Learning with Instruction: The former involves training the model at the instance level with parameter updates, while the latter generates a task-specific adapter at the task level with parameter generation.
  • Figure 2: Overview of TAGI. The hypernetwork takes instruction as input and generates adapters subsequently integrated into the vanilla LLM, and constructed the task-specific model as student. After training the task models through instances on multiple basic tasks as a teacher, TAGI constructs task-specific models by aligning the labels, output logits, and adapter parameters between teacher and student models. To improve compliance with task instructions and the efficacy of weight generation, TAGI undergoes a two-stage hypernetwork training process: hypernetwork pretraining and finetuning. a-c are random divisions of the sampled sentences from pretraining datasets.
  • Figure 3: The performance of different numbers of meta-training tasks. The backbone model is T5-LM-Base, all trained for 20,000 steps.
  • Figure 4: The percentage of generated parameters (%) against performance (RougeL). The backbone model is T5-LM-Base, all trained for 20,000 steps.
  • Figure 5: Analysis of T5-LM-XXL (11B).
  • ...and 1 more figures