Table of Contents
Fetching ...

Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models

Anni Zou, Zhuosheng Zhang, Hai Zhao, Xiangru Tang

TL;DR

This work introduces GeM-CoT, a generalizable chain-of-thought prompting framework for mixed-task scenarios where input types are unknown. GeM-CoT uses a Type Matching module to route each question to demonstrations from a corresponding type when a match is found, or otherwise performs zero-shot CoT and updates a data cache via density-based clustering to construct new demonstrations. The approach bridges generalization and performance by combining routing, dynamic demo construction, and continual demo pool maintenance, evaluated across 10 reasoning datasets and 23 BBH tasks. Results show that GeM-CoT improves generalization to unseen task types while maintaining or boosting reasoning accuracy, notably in streaming batch settings where more diverse demonstrations can be learned over time. The work offers a practical, training-free solution for robust real-world reasoning with LLMs and highlights the value of diversity in demonstrations and adaptive data augmentation.

Abstract

Large language models (LLMs) have unveiled remarkable reasoning capabilities by exploiting chain-of-thought (CoT) prompting, which generates intermediate reasoning chains to serve as the rationale for deriving the answer. However, current CoT methods either simply employ general prompts such as Let's think step by step, or heavily rely on pre-defined task-specific demonstrations to attain preferable performances, thereby engendering an inescapable gap between performance and generalization. To bridge this gap, we propose GeM-CoT, a Generalizable CoT prompting mechanism in Mixed-task scenarios where the type of input questions is unknown. GeM-CoT first categorizes the question type and subsequently samples or constructs demonstrations from the corresponding data pool in an automatic pattern. With this technical design, GeM-CoT simultaneously enjoys superior generalization capabilities and remarkable performances on 10 public reasoning tasks and 23 BBH tasks.

Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models

TL;DR

This work introduces GeM-CoT, a generalizable chain-of-thought prompting framework for mixed-task scenarios where input types are unknown. GeM-CoT uses a Type Matching module to route each question to demonstrations from a corresponding type when a match is found, or otherwise performs zero-shot CoT and updates a data cache via density-based clustering to construct new demonstrations. The approach bridges generalization and performance by combining routing, dynamic demo construction, and continual demo pool maintenance, evaluated across 10 reasoning datasets and 23 BBH tasks. Results show that GeM-CoT improves generalization to unseen task types while maintaining or boosting reasoning accuracy, notably in streaming batch settings where more diverse demonstrations can be learned over time. The work offers a practical, training-free solution for robust real-world reasoning with LLMs and highlights the value of diversity in demonstrations and adaptive data augmentation.

Abstract

Large language models (LLMs) have unveiled remarkable reasoning capabilities by exploiting chain-of-thought (CoT) prompting, which generates intermediate reasoning chains to serve as the rationale for deriving the answer. However, current CoT methods either simply employ general prompts such as Let's think step by step, or heavily rely on pre-defined task-specific demonstrations to attain preferable performances, thereby engendering an inescapable gap between performance and generalization. To bridge this gap, we propose GeM-CoT, a Generalizable CoT prompting mechanism in Mixed-task scenarios where the type of input questions is unknown. GeM-CoT first categorizes the question type and subsequently samples or constructs demonstrations from the corresponding data pool in an automatic pattern. With this technical design, GeM-CoT simultaneously enjoys superior generalization capabilities and remarkable performances on 10 public reasoning tasks and 23 BBH tasks.
Paper Structure (48 sections, 3 equations, 12 figures, 10 tables, 1 algorithm)

This paper contains 48 sections, 3 equations, 12 figures, 10 tables, 1 algorithm.

Figures (12)

  • Figure 1: Comparison of conventional single-task scenarios and our concerned setting: mixed-task scenarios. There are three major characteristics of mixed-task scenarios: (i) the type of any incoming question is unknown; (ii) the input data comes from a set of mixed tasks; (iii) the questions come in an arbitrary order.
  • Figure 2: Overview of our proposed GeM-CoT mechanism. GeM-CoT first routes the input question to different paths (Type Matching): i) path matched$\rightarrow$: For a successful match, it fetches demonstrations from the demo pool (Demo Acquisition) and performs a final inference (Answer Derivation). ii) path unmatched$\rightarrow$: For a failed match, it derives the zero-shot answer with rationales (Answer Derivation) and then updates the data cache through density-based clustering and automatically constructing demonstrations (Data Cache Update).
  • Figure 3: Flow chart of our GeM-CoT mechanism.
  • Figure 4: Process of five subsequent streaming batch data with batch size of 400 on BBH datasets.
  • Figure 5: Distribution of similarity scores in Type Matching module. We separately present the distribution of correctly and incorrectly matched scores.
  • ...and 7 more figures