Token-Efficient Leverage Learning in Large Language Models

Yuanhao Zeng; Min Wang; Yihang Wang; Yingxia Shao

Token-Efficient Leverage Learning in Large Language Models

Yuanhao Zeng, Min Wang, Yihang Wang, Yingxia Shao

TL;DR

The paper tackles the problem that large language models (LLMs) underperform on low-resource tasks due to scarce task-specific data. It introduces Leverage Learning, a framework that extracts task-specific capabilities from limited data while learning non-specific capabilities from abundant general data, grounded in a quantization hypothesis and the notion of quanta (with a proposed Q Sequence). The authors present a minimalist instantiation, Token-Efficient Leverage Learning (TELL), which combines anchor prompts and extensive shuffling of general data with task data, typically via LoRA-based PEFT, to achieve competitive results with far less task data across tasks ranging from $10^4$ to $10^6$ tokens. Key findings include substantial data-efficiency gains, evidence that the synergy of task and general data drives improvements, and an emergent-ability-like scaling when increasing general data. This approach avoids in-context learning pitfalls, reduces data-collection costs, and demonstrates potential cost reductions via self-generated anchors, suggesting broad applicability for low-resource NLP fine-tuning.

Abstract

Large Language Models (LLMs) have excelled in various tasks but perform better in high-resource scenarios, which presents challenges in low-resource scenarios. Data scarcity and the inherent difficulty of adapting LLMs to specific tasks compound the challenge. To address the twin hurdles, we introduce \textbf{Leverage Learning}. We present a streamlined implement of this methodology called Token-Efficient Leverage Learning (TELL). TELL showcases the potential of Leverage Learning, demonstrating effectiveness across various LLMs and low-resource tasks, ranging from $10^4$ to $10^6$ tokens. It reduces task data requirements by up to nearly an order of magnitude compared to conventional Supervised Fine-Tuning (SFT) while delivering competitive performance. With the same amount of task data, TELL leads in improving task performance compared to SFT. We discuss the mechanism of Leverage Learning, suggesting it aligns with quantization hypothesis and explore its promising potential through empirical testing.

Token-Efficient Leverage Learning in Large Language Models

TL;DR

tokens. Key findings include substantial data-efficiency gains, evidence that the synergy of task and general data drives improvements, and an emergent-ability-like scaling when increasing general data. This approach avoids in-context learning pitfalls, reduces data-collection costs, and demonstrates potential cost reductions via self-generated anchors, suggesting broad applicability for low-resource NLP fine-tuning.

Abstract

tokens. It reduces task data requirements by up to nearly an order of magnitude compared to conventional Supervised Fine-Tuning (SFT) while delivering competitive performance. With the same amount of task data, TELL leads in improving task performance compared to SFT. We discuss the mechanism of Leverage Learning, suggesting it aligns with quantization hypothesis and explore its promising potential through empirical testing.

Paper Structure (21 sections, 16 figures)

This paper contains 21 sections, 16 figures.

Introduction
Related Works
LLM Tuning Methods Across Different Data Volumes
Mixed Fine-Tuning
Leverage Learning and TELL
Leverage Learning
Components of TELL
Experiments
Experimental Setup
Experimental Results
Can TELL Solve the First Issue of Leverage Learning?
Ablation Study
Leverage Learning Scaling with General Tokens
Conclusion
Appendix
...and 6 more sections

Figures (16)

Figure 1: An overview of Leverage Learning
Figure 2: An overview of Token-Efficient Leverage Learning
Figure 3: t-SNE visualization without anchor prompt
Figure 4: t-SNE visualization with anchor prompt
Figure 5: EN to IS BLEURT Scores with anchor prompt or not
...and 11 more figures

Token-Efficient Leverage Learning in Large Language Models

TL;DR

Abstract

Token-Efficient Leverage Learning in Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (16)