Table of Contents
Fetching ...

LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History

Akash Gupta, Ivaxi Sheth, Vyas Raina, Mark Gales, Mario Fritz

TL;DR

This work makes the first attempt to formalize the study of vulnerabilities and interference of tasks in conversational LLMs caused by task-switches in the conversational history, revealing that many of the task-switches can lead to significant performance degradation.

Abstract

With the recent emergence of powerful instruction-tuned large language models (LLMs), various helpful conversational Artificial Intelligence (AI) systems have been deployed across many applications. When prompted by users, these AI systems successfully perform a wide range of tasks as part of a conversation. To provide some sort of memory and context, such approaches typically condition their output on the entire conversational history. Although this sensitivity to the conversational history can often lead to improved performance on subsequent tasks, we find that performance can in fact also be negatively impacted, if there is a task-switch. To the best of our knowledge, our work makes the first attempt to formalize the study of such vulnerabilities and interference of tasks in conversational LLMs caused by task-switches in the conversational history. Our experiments across 5 datasets with 15 task switches using popular LLMs reveal that many of the task-switches can lead to significant performance degradation.

LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History

TL;DR

This work makes the first attempt to formalize the study of vulnerabilities and interference of tasks in conversational LLMs caused by task-switches in the conversational history, revealing that many of the task-switches can lead to significant performance degradation.

Abstract

With the recent emergence of powerful instruction-tuned large language models (LLMs), various helpful conversational Artificial Intelligence (AI) systems have been deployed across many applications. When prompted by users, these AI systems successfully perform a wide range of tasks as part of a conversation. To provide some sort of memory and context, such approaches typically condition their output on the entire conversational history. Although this sensitivity to the conversational history can often lead to improved performance on subsequent tasks, we find that performance can in fact also be negatively impacted, if there is a task-switch. To the best of our knowledge, our work makes the first attempt to formalize the study of such vulnerabilities and interference of tasks in conversational LLMs caused by task-switches in the conversational history. Our experiments across 5 datasets with 15 task switches using popular LLMs reveal that many of the task-switches can lead to significant performance degradation.
Paper Structure (26 sections, 6 equations, 13 figures, 20 tables)

This paper contains 26 sections, 6 equations, 13 figures, 20 tables.

Figures (13)

  • Figure 1: An illustrative example where the chat history is based on sentiment prediction. Algebra word problem introduces task-switch which results in an incorrect prediction.
  • Figure 2: Target Task: MMLU Abstract Algebra. % change in accuracy relative to zero-shot performance.
  • Figure 3: Target Task: MMLU HA. Percentage % change in accuracy relative to zero-shot performance (no conversation history) for increasing conversation history length $L$ and various models.
  • Figure 4: Target Task: MMLU AA. Percentage % change in accuracy relative to zero-shot performance (no conversation history) for increasing conversation history length $L$ and various models.
  • Figure 5: Target Task: RT. Percentage % change in accuracy relative to zero-shot performance (no conversation history) for increasing conversation history length $L$ and various models.
  • ...and 8 more figures