Table of Contents
Fetching ...

Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners

Jihyeon Lee, Dain Kim, Doohae Jung, Boseop Kim, Kyoung-Woon On

TL;DR

This work challenges the view that in-context few-shot learning is mainly the province of decoder-only LLMs by demonstrating that encoder-decoder seq2seq models can serve as robust few-shot learners across a broad task spectrum. By aligning prompts with pretraining objectives and introducing fusion-based strategies (early and late fusion) to process demonstrations, the authors achieve strong gains that in some cases exceed much larger decoder-only models. They provide an evaluation toolkit, show consistent improvements across understanding and generation tasks (including SuperGLUE, XSum, and WebNLG), and show that permutation bias can be mitigated with their methods. The results suggest a shift toward leveraging seq2seq architectures for few-shot learning, with broad implications for the design of future LLMs and conversational AI systems.

Abstract

In-context learning, which offers substantial advantages over fine-tuning, is predominantly observed in decoder-only models, while encoder-decoder (i.e., seq2seq) models excel in methods that rely on weight updates. Recently, a few studies have demonstrated the feasibility of few-shot learning with seq2seq models; however, this has been limited to tasks that align well with the seq2seq architecture, such as summarization and translation. Inspired by these initial studies, we provide a first-ever extensive experiment comparing the in-context few-shot learning capabilities of decoder-only and encoder-decoder models on a broad range of tasks. Furthermore, we propose two methods to more effectively elicit in-context learning ability in seq2seq models: objective-aligned prompting and a fusion-based approach. Remarkably, our approach outperforms a decoder-only model that is six times larger and exhibits significant performance improvements compared to conventional seq2seq models across a variety of settings. We posit that, with the right configuration and prompt design, seq2seq models can be highly effective few-shot learners for a wide spectrum of applications.

Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners

TL;DR

This work challenges the view that in-context few-shot learning is mainly the province of decoder-only LLMs by demonstrating that encoder-decoder seq2seq models can serve as robust few-shot learners across a broad task spectrum. By aligning prompts with pretraining objectives and introducing fusion-based strategies (early and late fusion) to process demonstrations, the authors achieve strong gains that in some cases exceed much larger decoder-only models. They provide an evaluation toolkit, show consistent improvements across understanding and generation tasks (including SuperGLUE, XSum, and WebNLG), and show that permutation bias can be mitigated with their methods. The results suggest a shift toward leveraging seq2seq architectures for few-shot learning, with broad implications for the design of future LLMs and conversational AI systems.

Abstract

In-context learning, which offers substantial advantages over fine-tuning, is predominantly observed in decoder-only models, while encoder-decoder (i.e., seq2seq) models excel in methods that rely on weight updates. Recently, a few studies have demonstrated the feasibility of few-shot learning with seq2seq models; however, this has been limited to tasks that align well with the seq2seq architecture, such as summarization and translation. Inspired by these initial studies, we provide a first-ever extensive experiment comparing the in-context few-shot learning capabilities of decoder-only and encoder-decoder models on a broad range of tasks. Furthermore, we propose two methods to more effectively elicit in-context learning ability in seq2seq models: objective-aligned prompting and a fusion-based approach. Remarkably, our approach outperforms a decoder-only model that is six times larger and exhibits significant performance improvements compared to conventional seq2seq models across a variety of settings. We posit that, with the right configuration and prompt design, seq2seq models can be highly effective few-shot learners for a wide spectrum of applications.
Paper Structure (23 sections, 3 equations, 4 figures, 13 tables)

This paper contains 23 sections, 3 equations, 4 figures, 13 tables.

Figures (4)

  • Figure 1: Different prompting strategies for in-context learning.(a) A target input can be placed to the encoder side, concatenated with few-shot examples, or decoder side standalone. (b) Examples of pretraining objective aligned prompt. In alignment with T5, the sentinel token is attached to the target output. In alignment with UL2, the mode tag is added as a prefix to the encoder input.
  • Figure 2: An overview of the proposed approaches. Each case is presenting a 3-shot setting.
  • Figure 3: Comparison of in-context learning ability among seq2seq models by increasing the number of shots. We experiment on the same tasks as presented in Table \ref{['tab: compare decoder vs. t5']} and report the average accuracy and standard deviation results for four seq2seq baseline models.
  • Figure 4: Selected PromptSource templates for evaluating the XSum and WebNLG tasks. Both examples describe the prompt for one-shot learning scenarios.