Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels

Chaoqun Liu; Qin Chao; Wenxuan Zhang; Xiaobao Wu; Boyang Li; Anh Tuan Luu; Lidong Bing

Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels

Chaoqun Liu, Qin Chao, Wenxuan Zhang, Xiaobao Wu, Boyang Li, Anh Tuan Luu, Lidong Bing

TL;DR

This study proposes a new paradigm termed zero-to-strong generalization, which iteratively prompt LLMs to annotate unlabeled data and retain high-quality labels by filtering and indicates that this paradigm is effective for both in-context learning and fine-tuning, and for various model sizes.

Abstract

Large Language Models (LLMs) have demonstrated remarkable performance through supervised fine-tuning or in-context learning using gold labels. However, this paradigm is limited by the availability of gold labels, while in certain scenarios, LLMs may need to perform tasks that are too complex for humans to provide such labels. To tackle this challenge, this study explores whether solely utilizing unlabeled data can elicit strong model capabilities. We propose a new paradigm termed zero-to-strong generalization. We iteratively prompt LLMs to annotate unlabeled data and retain high-quality labels by filtering. Surprisingly, we obverse that this iterative process gradually unlocks LLMs' potential on downstream tasks. Our experiments on extensive classification and reasoning tasks confirm the effectiveness of our proposed framework. Our analysis indicates that this paradigm is effective for both in-context learning and fine-tuning, and for various model sizes.

Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels

TL;DR

Abstract

Paper Structure (38 sections, 11 figures, 13 tables, 1 algorithm)

This paper contains 38 sections, 11 figures, 13 tables, 1 algorithm.

Introduction
Methodology
Problem Definition
Zero-to-Strong Generalization
Demonstration construction.
Response generation.
Sample selection.
Iterative evolution.
Experiments
Tasks
Classification tasks.
Extreme-label classification tasks.
Reasoning tasks.
Baseline Methods
Zero-shot methods.
...and 23 more sections

Figures (11)

Figure 1: Illustration of (a) weak-to-strong burns_weak--strong_2023 and (b) our zero-to-strong analogy. While weak-to-strong uses weak models to supervise strong models, zero-to-strong elicits LLM capabilities without ground-truth labels or weak supervisors.
Figure 2: Illustration of (a) zero-to-strong generalization on a sentiment analysis task and (b) the filtering process. For classification tasks, we select demonstrations by ranking the probabilities for each label. For reasoning tasks, we select the most confident answers based on self-consistency wang_self-consistency_2023.
Figure 3: Average macro-F1 for 17 classification tasks, using two LLMs and two initialization settings. "z2s-i" means the ith round of iteration for zero-to-strong method.
Figure 4: Average macro-F1 for GoEmotions, using two LLMs and two initialization settings.
Figure 5: Accuracy for the two reasoning tasks.
...and 6 more figures

Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels

TL;DR

Abstract

Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels

Authors

TL;DR

Abstract

Table of Contents

Figures (11)