Toward Zero-Shot Instruction Following
Renze Lou, Wenpeng Yin
TL;DR
This work tackles zero-shot instruction following by leveraging paragraph-style task definitions rather than demonstrations. It introduces Pick&Rank, combining Strategy I (automatic critical-sentence extraction via a pointer network with Gumbel-Softmax) and Strategy II (a ranking-based objective across instruction variants) to better align model outputs with the essential instruction content. On Super-NaturalInstructions, the approach achieves state-of-the-art results, validating that explicit highlighting and discriminative training over informative instructions enhances cross-task generalization. The paper contributes end-to-end trainable components and analyzes error patterns (e.g., negation and incomplete critical sentence detection), informing future directions for zero-shot instruction understanding and masking strategies.
Abstract
This work proposes a challenging yet more realistic setting for zero-shot cross-task generalization: zero-shot instruction following, presuming the existence of a paragraph-style task definition while no demonstrations exist. To better learn the task supervision from the definition, we propose two strategies: first, to automatically find out the critical sentences in the definition; second, a ranking objective to force the model to generate the gold outputs with higher probabilities when those critical parts are highlighted in the definition. The joint efforts of the two strategies yield state-of-the-art performance on the Super-NaturalInstructions. Our code is available on GitHub.
