Table of Contents
Fetching ...

DAWN-ICL: Strategic Planning of Problem-solving Trajectories for Zero-Shot In-Context Learning

Xinyu Tang, Xiaolei Wang, Wayne Xin Zhao, Ji-Rong Wen

TL;DR

DAWN-ICL reframes zero-shot in-context learning as a planning problem and introduces Demonstration-aware Monte Carlo Tree Search to strategically order pseudo-demonstrations. By augmenting MCTS with a demonstration-aware Q-value and applying calibration-enhanced aggregation, it improves both the quality and reliability of in-context problem solving. Empirical results on BBH and MMLU across multiple LLMs show that DAWN-ICL consistently outperforms strong ZS-ICL baselines and can even surpass ICL with human demonstrations, highlighting the practical value of planning in zero-shot contexts. The approach advances zero-shot adaptation for multi-task scenarios, offering a scalable path toward more robust in-context learning without labeled data.

Abstract

Zero-shot in-context learning (ZS-ICL) aims to conduct in-context learning (ICL) without using human-annotated demonstrations. Most ZS-ICL methods use large language models (LLMs) to generate (input, label) pairs as pseudo-demonstrations and leverage historical pseudo-demonstrations to help solve the current problem. They assume that problems are from the same task and traverse them in a random order. However, in real-world scenarios, problems usually come from diverse tasks, and only a few belong to the same task. The random traversing order may generate unreliable pseudo-demonstrations and lead to error accumulation. To address this problem, we reformulate ZS-ICL as a planning problem and propose a Demonstration-aware Monte Carlo Tree Search (MCTS) approach (DAWN-ICL), which leverages MCTS to strategically plan the problem-solving trajectories for ZS-ICL. In addition, to achieve effective and efficient Q value estimation, we propose a novel demonstration-aware Q-value function and use it to enhance the selection phase and accelerate the expansion and simulation phases in MCTS. Extensive experiments demonstrate the effectiveness and efficiency of DAWN-ICL on in-domain and cross-domain scenarios, and it even outperforms ICL using human-annotated labels. The code is available at https://github.com/RUCAIBox/MCTS4ZSICL.

DAWN-ICL: Strategic Planning of Problem-solving Trajectories for Zero-Shot In-Context Learning

TL;DR

DAWN-ICL reframes zero-shot in-context learning as a planning problem and introduces Demonstration-aware Monte Carlo Tree Search to strategically order pseudo-demonstrations. By augmenting MCTS with a demonstration-aware Q-value and applying calibration-enhanced aggregation, it improves both the quality and reliability of in-context problem solving. Empirical results on BBH and MMLU across multiple LLMs show that DAWN-ICL consistently outperforms strong ZS-ICL baselines and can even surpass ICL with human demonstrations, highlighting the practical value of planning in zero-shot contexts. The approach advances zero-shot adaptation for multi-task scenarios, offering a scalable path toward more robust in-context learning without labeled data.

Abstract

Zero-shot in-context learning (ZS-ICL) aims to conduct in-context learning (ICL) without using human-annotated demonstrations. Most ZS-ICL methods use large language models (LLMs) to generate (input, label) pairs as pseudo-demonstrations and leverage historical pseudo-demonstrations to help solve the current problem. They assume that problems are from the same task and traverse them in a random order. However, in real-world scenarios, problems usually come from diverse tasks, and only a few belong to the same task. The random traversing order may generate unreliable pseudo-demonstrations and lead to error accumulation. To address this problem, we reformulate ZS-ICL as a planning problem and propose a Demonstration-aware Monte Carlo Tree Search (MCTS) approach (DAWN-ICL), which leverages MCTS to strategically plan the problem-solving trajectories for ZS-ICL. In addition, to achieve effective and efficient Q value estimation, we propose a novel demonstration-aware Q-value function and use it to enhance the selection phase and accelerate the expansion and simulation phases in MCTS. Extensive experiments demonstrate the effectiveness and efficiency of DAWN-ICL on in-domain and cross-domain scenarios, and it even outperforms ICL using human-annotated labels. The code is available at https://github.com/RUCAIBox/MCTS4ZSICL.

Paper Structure

This paper contains 25 sections, 4 equations, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Comparison of our method with previous methods. Although both predictions at $i$-th step are correct, in previous work, the example is randomly selected and helpless. In contrast, in our method, the example is selected with planning and helpful.
  • Figure 2: The overview of DAWN-ICL. (a) An illustration of the four phases in MCTS. We select nodes using our proposed DUCT (Eq. \ref{['eq:selection']}), perform expansion using our proposed DQ function (Eq. \ref{['eq:Q-func']}), accelerate simulation with an action cache supported by the DQ function, and finally back-propagate the rewards. (b) We improve the $Q$-value function with pseudo-demonstration information. We retrieve $k$ pseudo-demonstrations and add the score of confidence and similarity as the initial value of the $Q$ function.
  • Figure 3: Accuracy on BBH with increasing numbers of iterations using the selection strategy of random, UCT, and our proposed DUCT.
  • Figure 4: The Accuracy (%) of BBH with different demonstration selection methods.
  • Figure 5: The error accumulation phenomenon of the similarity-based demonstration selection method.
  • ...and 1 more figures