Table of Contents
Fetching ...

Enabling Natural Zero-Shot Prompting on Encoder Models via Statement-Tuning

Ahmed Elshabrawy, Yongxin Huang, Iryna Gurevych, Alham Fikri Aji

TL;DR

The study investigates the impact of several design choices on few-shot and zero-shot generalization, revealing that Statement-Tuning can achieve strong performance with modest training data and benefits from task and statement diversity for unseen task generalizability.

Abstract

While Large Language Models (LLMs) exhibit remarkable capabilities in zero-shot and few-shot scenarios, they often require computationally prohibitive sizes. Conversely, smaller Masked Language Models (MLMs) like BERT and RoBERTa achieve state-of-the-art results through fine-tuning but struggle with extending to few-shot and zero-shot settings due to their architectural constraints. Hence, we propose Statement-Tuning, a technique that models discriminative tasks as a set of finite statements and trains an encoder model to discriminate between the potential statements to determine the label. We do Statement-Tuning on multiple tasks to enable cross-task generalization. Experimental results demonstrate that Statement-Tuning achieves competitive performance compared to state-of-the-art LLMs with significantly fewer parameters. Moreover, the study investigates the impact of several design choices on few-shot and zero-shot generalization, revealing that Statement-Tuning can achieve strong performance with modest training data and benefits from task and statement diversity for unseen task generalizability.

Enabling Natural Zero-Shot Prompting on Encoder Models via Statement-Tuning

TL;DR

The study investigates the impact of several design choices on few-shot and zero-shot generalization, revealing that Statement-Tuning can achieve strong performance with modest training data and benefits from task and statement diversity for unseen task generalizability.

Abstract

While Large Language Models (LLMs) exhibit remarkable capabilities in zero-shot and few-shot scenarios, they often require computationally prohibitive sizes. Conversely, smaller Masked Language Models (MLMs) like BERT and RoBERTa achieve state-of-the-art results through fine-tuning but struggle with extending to few-shot and zero-shot settings due to their architectural constraints. Hence, we propose Statement-Tuning, a technique that models discriminative tasks as a set of finite statements and trains an encoder model to discriminate between the potential statements to determine the label. We do Statement-Tuning on multiple tasks to enable cross-task generalization. Experimental results demonstrate that Statement-Tuning achieves competitive performance compared to state-of-the-art LLMs with significantly fewer parameters. Moreover, the study investigates the impact of several design choices on few-shot and zero-shot generalization, revealing that Statement-Tuning can achieve strong performance with modest training data and benefits from task and statement diversity for unseen task generalizability.
Paper Structure (49 sections, 6 figures, 8 tables)

This paper contains 49 sections, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Overview of Statement-Tuning. We train an encoder to discriminate the truth value of statements from multiple tasks, then we apply it in the zero-shot setting by creating a statement for each possible target label and choosing the most likely one according to the encoder discriminator.
  • Figure 2: Example conversion of the MNLI task to natural language statements.
  • Figure 3: N-shot accuracy of Statement-Tuned RoBERTa-base models across training datasets of different sizes. The x-axis denotes the number of statements per Statement-Tuning training dataset, with the number of training datasets fixed.
  • Figure 4: N-shot improvement of Statement-Tuned RoBERTa-base of varying training set sizes over standard fine-tuning. The y-axis, Delta, is the difference between the accuracy of the Statement-Tuned model and the accuracy achieved by regular fine-tuning of RoBERTa-base on the task. A positive Delta indicates improvement over the baseline approach.
  • Figure 5: N-shot improvement of Statement-Tuned RoBERTa-base models used for regular finetuning. The y-axis, Delta, is the difference between the accuracy of the Statement-Tuned model fine-tuned for the task directly by discarding the Statement-Tuning classification head and the accuracy achieved by regular fine-tuning of RoBERTa-base on the task. A positive Delta indicates improvement over the baseline approach.
  • ...and 1 more figures