Table of Contents
Fetching ...

DELIA: Diversity-Enhanced Learning for Instruction Adaptation in Large Language Models

Yuanhao Zeng, Fei Ren, Xinpeng Zhou, Yihang Wang, Yingxia Shao

TL;DR

DELIA addresses a core limitation of instruction tuning in large language models by leveraging diversity-rich data synthesis to transform biased instruction-tuning features into approximations of ideal task features, without explicit prior knowledge of those ideals. By exploiting the buffering effect of extensive diverse data, DELIA enables data-driven, scalable learning and better aligns internal representations with downstream semantics. Empirical results show significant gains over standard instruction tuning and baselines, including Icelandic-English translation BLEURT improvements of 17.07%-33.41% on WMT-21 gemma-7b-it and a 36.1% accuracy boost on formatted text generation with Llama2-7b-chat, along with unique alignment of new token semantics to prior meanings. Overall, DELIA demonstrates a practical, data-centered approach to instruction adaptation that enhances cross-task transfer in LLMs and narrows gaps between instruction formats and underlying semantic understanding.

Abstract

Although instruction tuning is widely used to adjust behavior in Large Language Models (LLMs), extensive empirical evidence and research indicates that it is primarily a process where the model fits to specific task formats, rather than acquiring new knowledge or capabilities. We propose that this limitation stems from biased features learned during instruction tuning, which differ from ideal task-specfic features, leading to learn less underlying semantics in downstream tasks. However, ideal features are unknown and incalculable, constraining past work to rely on prior knowledge to assist reasoning or training, which limits LLMs' capabilities to the developers' abilities, rather than data-driven scalable learning. In our paper, through our novel data synthesis method, DELIA (Diversity-Enhanced Learning for Instruction Adaptation), we leverage the buffering effect of extensive diverse data in LLMs training to transform biased features in instruction tuning into approximations of ideal features, without explicit prior ideal features. Experiments show DELIA's better performance compared to common instruction tuning and other baselines. It outperforms common instruction tuning by 17.07%-33.41% on Icelandic-English translation bleurt score (WMT-21 dataset, gemma-7b-it) and improves accuracy by 36.1% on formatted text generation (Llama2-7b-chat). Notably, among knowledge injection methods we've known, DELIA uniquely align the internal representations of new special tokens with their prior semantics.

DELIA: Diversity-Enhanced Learning for Instruction Adaptation in Large Language Models

TL;DR

DELIA addresses a core limitation of instruction tuning in large language models by leveraging diversity-rich data synthesis to transform biased instruction-tuning features into approximations of ideal task features, without explicit prior knowledge of those ideals. By exploiting the buffering effect of extensive diverse data, DELIA enables data-driven, scalable learning and better aligns internal representations with downstream semantics. Empirical results show significant gains over standard instruction tuning and baselines, including Icelandic-English translation BLEURT improvements of 17.07%-33.41% on WMT-21 gemma-7b-it and a 36.1% accuracy boost on formatted text generation with Llama2-7b-chat, along with unique alignment of new token semantics to prior meanings. Overall, DELIA demonstrates a practical, data-centered approach to instruction adaptation that enhances cross-task transfer in LLMs and narrows gaps between instruction formats and underlying semantic understanding.

Abstract

Although instruction tuning is widely used to adjust behavior in Large Language Models (LLMs), extensive empirical evidence and research indicates that it is primarily a process where the model fits to specific task formats, rather than acquiring new knowledge or capabilities. We propose that this limitation stems from biased features learned during instruction tuning, which differ from ideal task-specfic features, leading to learn less underlying semantics in downstream tasks. However, ideal features are unknown and incalculable, constraining past work to rely on prior knowledge to assist reasoning or training, which limits LLMs' capabilities to the developers' abilities, rather than data-driven scalable learning. In our paper, through our novel data synthesis method, DELIA (Diversity-Enhanced Learning for Instruction Adaptation), we leverage the buffering effect of extensive diverse data in LLMs training to transform biased features in instruction tuning into approximations of ideal features, without explicit prior ideal features. Experiments show DELIA's better performance compared to common instruction tuning and other baselines. It outperforms common instruction tuning by 17.07%-33.41% on Icelandic-English translation bleurt score (WMT-21 dataset, gemma-7b-it) and improves accuracy by 36.1% on formatted text generation (Llama2-7b-chat). Notably, among knowledge injection methods we've known, DELIA uniquely align the internal representations of new special tokens with their prior semantics.
Paper Structure (64 sections, 3 figures, 2 tables, 1 algorithm)

This paper contains 64 sections, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: 使用修剪(trim)和裁剪(clip)命令会产生脆弱的层,这可能在颜色空间被校正或PDF与其他文件合并以用于最终会议时导致灾难(例如,这个来自实际论文的案例)。在图形程序中正确裁剪你的图形——而不是在LaTeX中。
  • Figure 2: 调整边界框而不是实际删除不需要的数据导致了本文中的多个层。这也无谓地增加了PDF的大小。在这种情况下,不需要的层的大小使论文的大小翻了一番,并在最终制作中产生了以下令人惊讶的结果。请在图形程序中正确裁剪您的图形。不要仅仅更改边界框。
  • Figure 3: Example listing quicksort.hs