Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners
Jihyeon Lee, Dain Kim, Doohae Jung, Boseop Kim, Kyoung-Woon On
TL;DR
This work challenges the view that in-context few-shot learning is mainly the province of decoder-only LLMs by demonstrating that encoder-decoder seq2seq models can serve as robust few-shot learners across a broad task spectrum. By aligning prompts with pretraining objectives and introducing fusion-based strategies (early and late fusion) to process demonstrations, the authors achieve strong gains that in some cases exceed much larger decoder-only models. They provide an evaluation toolkit, show consistent improvements across understanding and generation tasks (including SuperGLUE, XSum, and WebNLG), and show that permutation bias can be mitigated with their methods. The results suggest a shift toward leveraging seq2seq architectures for few-shot learning, with broad implications for the design of future LLMs and conversational AI systems.
Abstract
In-context learning, which offers substantial advantages over fine-tuning, is predominantly observed in decoder-only models, while encoder-decoder (i.e., seq2seq) models excel in methods that rely on weight updates. Recently, a few studies have demonstrated the feasibility of few-shot learning with seq2seq models; however, this has been limited to tasks that align well with the seq2seq architecture, such as summarization and translation. Inspired by these initial studies, we provide a first-ever extensive experiment comparing the in-context few-shot learning capabilities of decoder-only and encoder-decoder models on a broad range of tasks. Furthermore, we propose two methods to more effectively elicit in-context learning ability in seq2seq models: objective-aligned prompting and a fusion-based approach. Remarkably, our approach outperforms a decoder-only model that is six times larger and exhibits significant performance improvements compared to conventional seq2seq models across a variety of settings. We posit that, with the right configuration and prompt design, seq2seq models can be highly effective few-shot learners for a wide spectrum of applications.
