Table of Contents
Fetching ...

Functionality learning through specification instructions

Pedro Henrique Luz de Araujo, Benjamin Roth

TL;DR

This paper introduces specification instructions: text descriptions specifying fine-grained task-specific behaviors, which are combined to create specification-augmented prompts, which are fed to language models pre-trained on natural instruction data.

Abstract

Test suites assess natural language processing models' performance on specific functionalities: cases of interest involving model robustness, fairness, or particular linguistic capabilities. This paper introduces specification instructions: text descriptions specifying fine-grained task-specific behaviors. For each functionality in a suite, we generate an instruction that describes it. We combine the specification instructions to create specification-augmented prompts, which we feed to language models pre-trained on natural instruction data. We conduct experiments to measure how optimizing for some functionalities may negatively impact functionalities that are not covered by the specification set. Our analyses across four tasks and models of diverse sizes and families show that smaller models struggle to follow specification instructions. However, larger models (>~3B params.) can benefit from specifications and -- surprisingly -- even generalize certain desirable behaviors across functionalities.

Functionality learning through specification instructions

TL;DR

This paper introduces specification instructions: text descriptions specifying fine-grained task-specific behaviors, which are combined to create specification-augmented prompts, which are fed to language models pre-trained on natural instruction data.

Abstract

Test suites assess natural language processing models' performance on specific functionalities: cases of interest involving model robustness, fairness, or particular linguistic capabilities. This paper introduces specification instructions: text descriptions specifying fine-grained task-specific behaviors. For each functionality in a suite, we generate an instruction that describes it. We combine the specification instructions to create specification-augmented prompts, which we feed to language models pre-trained on natural instruction data. We conduct experiments to measure how optimizing for some functionalities may negatively impact functionalities that are not covered by the specification set. Our analyses across four tasks and models of diverse sizes and families show that smaller models struggle to follow specification instructions. However, larger models (>~3B params.) can benefit from specifications and -- surprisingly -- even generalize certain desirable behaviors across functionalities.
Paper Structure (35 sections, 1 equation, 8 figures, 7 tables)

This paper contains 35 sections, 1 equation, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Example of a specification-augmented prompt for sentiment analysis. Each module adds information about how the task is expected to be performed.
  • Figure 2: Dataset and suite results for exemplar-augmented prompts. Results for prompts without exemplars are shown in Appendix \ref{['sec:addResults']}. Results from the Flan-T5 models are connected with lines to denote that they share the same architecture, training data and training procedure, varying only in number of parameters chung2022scaling.
  • Figure 3: Specification prediction $F_1$ scores. The horizontal lines show results for a classifier that randomly selects a specification.
  • Figure 4: Distribution of ChatGPT-generated specification instruction quality.
  • Figure 5: Distribution of functionality pass rates achieved by ChatGPT through Task+Ex (above) and Task+Spec(chatGPT)+Ex (below).
  • ...and 3 more figures