Toward Zero-Shot Instruction Following

Renze Lou; Wenpeng Yin

Toward Zero-Shot Instruction Following

Renze Lou, Wenpeng Yin

TL;DR

This work tackles zero-shot instruction following by leveraging paragraph-style task definitions rather than demonstrations. It introduces Pick&Rank, combining Strategy I (automatic critical-sentence extraction via a pointer network with Gumbel-Softmax) and Strategy II (a ranking-based objective across instruction variants) to better align model outputs with the essential instruction content. On Super-NaturalInstructions, the approach achieves state-of-the-art results, validating that explicit highlighting and discriminative training over informative instructions enhances cross-task generalization. The paper contributes end-to-end trainable components and analyzes error patterns (e.g., negation and incomplete critical sentence detection), informing future directions for zero-shot instruction understanding and masking strategies.

Abstract

This work proposes a challenging yet more realistic setting for zero-shot cross-task generalization: zero-shot instruction following, presuming the existence of a paragraph-style task definition while no demonstrations exist. To better learn the task supervision from the definition, we propose two strategies: first, to automatically find out the critical sentences in the definition; second, a ranking objective to force the model to generate the gold outputs with higher probabilities when those critical parts are highlighted in the definition. The joint efforts of the two strategies yield state-of-the-art performance on the Super-NaturalInstructions. Our code is available on GitHub.

Toward Zero-Shot Instruction Following

TL;DR

Abstract

Paper Structure (20 sections, 3 equations, 1 figure, 6 tables)

This paper contains 20 sections, 3 equations, 1 figure, 6 tables.

Introduction
Related Work
Prompt & In-context Learning.
Follow Human-annotation Instructions.
Problem Definition & Our Approach
Zero-Shot Instruction Following:
Strategy I: picking critical sentences of instructions.
Strategy II: ranking-based objective.
Experiments
Dataset.
Baselines.
Our model implementation.
Results.
Analysis.
Conclusion
...and 5 more sections

Figures (1)

Figure 1: The illustration of our Pick&Rank. Two main components: Strategy I (Pick) and Strategy II (Rank). Strategy I aims to predict a binary value for each sentence in a definition, indicating whether a sentence is crucial. The outputs of Strategy I are used to construct instructions with different sufficiencies, e.g., "Repeat" represents the most beneficial instructions where the crucial sentences are repeated. Strategy II then drives the LMs to generate higher ground-truth probabilities on the more beneficial instructions. The whole system is optimized end-to-end.

Toward Zero-Shot Instruction Following

TL;DR

Abstract

Toward Zero-Shot Instruction Following

Authors

TL;DR

Abstract

Table of Contents

Figures (1)