Table of Contents
Fetching ...

How are Prompts Different in Terms of Sensitivity?

Sheng Lu, Hendrik Schuff, Iryna Gurevych

TL;DR

This work introduces sensitivity-aware decoding which incorporates sensitivity estimation as a penalty term in the standard greedy decoding, and shows that this approach is particularly helpful when information in the input is scarce.

Abstract

In-context learning (ICL) has become one of the most popular learning paradigms. While there is a growing body of literature focusing on prompt engineering, there is a lack of systematic analysis comparing the effects of prompts across different models and tasks. To address this gap, we present a comprehensive prompt analysis based on the sensitivity of a function. Our analysis reveals that sensitivity is an unsupervised proxy for model performance, as it exhibits a strong negative correlation with accuracy. We use gradient-based saliency scores to empirically demonstrate how different prompts affect the relevance of input tokens to the output, resulting in different levels of sensitivity. Furthermore, we introduce sensitivity-aware decoding which incorporates sensitivity estimation as a penalty term in the standard greedy decoding. We show that this approach is particularly helpful when information in the input is scarce. Our work provides a fresh perspective on the analysis of prompts, and contributes to a better understanding of the mechanism of ICL.

How are Prompts Different in Terms of Sensitivity?

TL;DR

This work introduces sensitivity-aware decoding which incorporates sensitivity estimation as a penalty term in the standard greedy decoding, and shows that this approach is particularly helpful when information in the input is scarce.

Abstract

In-context learning (ICL) has become one of the most popular learning paradigms. While there is a growing body of literature focusing on prompt engineering, there is a lack of systematic analysis comparing the effects of prompts across different models and tasks. To address this gap, we present a comprehensive prompt analysis based on the sensitivity of a function. Our analysis reveals that sensitivity is an unsupervised proxy for model performance, as it exhibits a strong negative correlation with accuracy. We use gradient-based saliency scores to empirically demonstrate how different prompts affect the relevance of input tokens to the output, resulting in different levels of sensitivity. Furthermore, we introduce sensitivity-aware decoding which incorporates sensitivity estimation as a penalty term in the standard greedy decoding. We show that this approach is particularly helpful when information in the input is scarce. Our work provides a fresh perspective on the analysis of prompts, and contributes to a better understanding of the mechanism of ICL.
Paper Structure (23 sections, 5 equations, 13 figures, 15 tables)

This paper contains 23 sections, 5 equations, 13 figures, 15 tables.

Figures (13)

  • Figure 1: (a) We generate synthetic data for testing instances using hahn2021sensitivity's framework. (b) We perform inference multiple times using the original and synthetic data, and calculate sensitivity based on the predictions.
  • Figure 2: The average accuracy and sensitivity of each model using various prompts across different datasets. * indicates prompts that are not tested on all datasets.
  • Figure 3: The accuracy and sensitivity of different models using base_a, base_b, CoT_base_a, and CoT.
  • Figure 4: The accuracy and sensitivity of predictions obtained using greedy decoding and Top-k sampling across different models.
  • Figure 5: Saliency scores over tokens of CoLA instances with base_b obtained using GPT-6B-JT.
  • ...and 8 more figures