Table of Contents
Fetching ...

Token-Efficient Leverage Learning in Large Language Models

Yuanhao Zeng, Min Wang, Yihang Wang, Yingxia Shao

TL;DR

The paper tackles the problem that large language models (LLMs) underperform on low-resource tasks due to scarce task-specific data. It introduces Leverage Learning, a framework that extracts task-specific capabilities from limited data while learning non-specific capabilities from abundant general data, grounded in a quantization hypothesis and the notion of quanta (with a proposed Q Sequence). The authors present a minimalist instantiation, Token-Efficient Leverage Learning (TELL), which combines anchor prompts and extensive shuffling of general data with task data, typically via LoRA-based PEFT, to achieve competitive results with far less task data across tasks ranging from $10^4$ to $10^6$ tokens. Key findings include substantial data-efficiency gains, evidence that the synergy of task and general data drives improvements, and an emergent-ability-like scaling when increasing general data. This approach avoids in-context learning pitfalls, reduces data-collection costs, and demonstrates potential cost reductions via self-generated anchors, suggesting broad applicability for low-resource NLP fine-tuning.

Abstract

Large Language Models (LLMs) have excelled in various tasks but perform better in high-resource scenarios, which presents challenges in low-resource scenarios. Data scarcity and the inherent difficulty of adapting LLMs to specific tasks compound the challenge. To address the twin hurdles, we introduce \textbf{Leverage Learning}. We present a streamlined implement of this methodology called Token-Efficient Leverage Learning (TELL). TELL showcases the potential of Leverage Learning, demonstrating effectiveness across various LLMs and low-resource tasks, ranging from $10^4$ to $10^6$ tokens. It reduces task data requirements by up to nearly an order of magnitude compared to conventional Supervised Fine-Tuning (SFT) while delivering competitive performance. With the same amount of task data, TELL leads in improving task performance compared to SFT. We discuss the mechanism of Leverage Learning, suggesting it aligns with quantization hypothesis and explore its promising potential through empirical testing.

Token-Efficient Leverage Learning in Large Language Models

TL;DR

The paper tackles the problem that large language models (LLMs) underperform on low-resource tasks due to scarce task-specific data. It introduces Leverage Learning, a framework that extracts task-specific capabilities from limited data while learning non-specific capabilities from abundant general data, grounded in a quantization hypothesis and the notion of quanta (with a proposed Q Sequence). The authors present a minimalist instantiation, Token-Efficient Leverage Learning (TELL), which combines anchor prompts and extensive shuffling of general data with task data, typically via LoRA-based PEFT, to achieve competitive results with far less task data across tasks ranging from to tokens. Key findings include substantial data-efficiency gains, evidence that the synergy of task and general data drives improvements, and an emergent-ability-like scaling when increasing general data. This approach avoids in-context learning pitfalls, reduces data-collection costs, and demonstrates potential cost reductions via self-generated anchors, suggesting broad applicability for low-resource NLP fine-tuning.

Abstract

Large Language Models (LLMs) have excelled in various tasks but perform better in high-resource scenarios, which presents challenges in low-resource scenarios. Data scarcity and the inherent difficulty of adapting LLMs to specific tasks compound the challenge. To address the twin hurdles, we introduce \textbf{Leverage Learning}. We present a streamlined implement of this methodology called Token-Efficient Leverage Learning (TELL). TELL showcases the potential of Leverage Learning, demonstrating effectiveness across various LLMs and low-resource tasks, ranging from to tokens. It reduces task data requirements by up to nearly an order of magnitude compared to conventional Supervised Fine-Tuning (SFT) while delivering competitive performance. With the same amount of task data, TELL leads in improving task performance compared to SFT. We discuss the mechanism of Leverage Learning, suggesting it aligns with quantization hypothesis and explore its promising potential through empirical testing.
Paper Structure (21 sections, 16 figures)

This paper contains 21 sections, 16 figures.

Figures (16)

  • Figure 1: An overview of Leverage Learning
  • Figure 2: An overview of Token-Efficient Leverage Learning
  • Figure 3: t-SNE visualization without anchor prompt
  • Figure 4: t-SNE visualization with anchor prompt
  • Figure 5: EN to IS BLEURT Scores with anchor prompt or not
  • ...and 11 more figures