Table of Contents
Fetching ...

Making Large Language Models Better Reasoners with Orchestrated Streaming Experiences

Xiangyang Liu, Junliang He, Xipeng Qiu

TL;DR

The paper addresses the challenge of improving LLM reasoning without labeled data or external feedback, particularly in zero- and streaming settings. It introduces RoSE, a framework that maintains a Streaming Experience Pool of answered questions and corresponding reasoning traces, and uses a three-stage orchestration (diversity, uncertainty-based filtering, and complexity-based filtering) to select diverse, informative demonstrations for each new question. RoSE demonstrates significant performance gains across nine reasoning tasks and two LLMs, and proves transferable across different CoT prompting methods, while providing extensive ablations and stability analyses. The approach offers a practical path toward self-improving reasoning systems that adapt demonstrations on the fly, potentially reducing manual engineering and enabling robust reasoning across domains.

Abstract

Large language models (LLMs) can perform complex reasoning by generating intermediate thoughts under zero-shot or few-shot settings. However, zero-shot prompting always encounters low performance, and the superior performance of few-shot prompting hinges on the manual-crafted demonstrations. In this paper, we present RoSE (Reasoning with Orchestrated Streaming Experiences), a general framework for solving reasoning tasks that can self-improve without complex external efforts. To enable RoSE, we describe an architecture that extends an LLM to store all answered questions and their thoughts in a streaming experience pool then orchestrates helpful questions from the pool to assist in answering new questions. To set up a question-aware orchestration mechanism, RoSE first calculates the similarity of each question in the pool with a new test question. Since the solution to each answered question is not always correct, RoSE will sort the questions according to their similarity with the new question, and then uniformly divide them into multiple buckets. It finally extracts one question from each bucket to make these extracted questions more diverse. To make these extracted questions help RoSE answer new questions as much as possible, we introduce two other attributes of uncertainty and complexity for each question. RoSE will preferentially select the questions with low uncertainty and high complexity from each bucket. We evaluate the versatility of RoSE in various reasoning tasks, LLMs, and CoT methods.

Making Large Language Models Better Reasoners with Orchestrated Streaming Experiences

TL;DR

The paper addresses the challenge of improving LLM reasoning without labeled data or external feedback, particularly in zero- and streaming settings. It introduces RoSE, a framework that maintains a Streaming Experience Pool of answered questions and corresponding reasoning traces, and uses a three-stage orchestration (diversity, uncertainty-based filtering, and complexity-based filtering) to select diverse, informative demonstrations for each new question. RoSE demonstrates significant performance gains across nine reasoning tasks and two LLMs, and proves transferable across different CoT prompting methods, while providing extensive ablations and stability analyses. The approach offers a practical path toward self-improving reasoning systems that adapt demonstrations on the fly, potentially reducing manual engineering and enabling robust reasoning across domains.

Abstract

Large language models (LLMs) can perform complex reasoning by generating intermediate thoughts under zero-shot or few-shot settings. However, zero-shot prompting always encounters low performance, and the superior performance of few-shot prompting hinges on the manual-crafted demonstrations. In this paper, we present RoSE (Reasoning with Orchestrated Streaming Experiences), a general framework for solving reasoning tasks that can self-improve without complex external efforts. To enable RoSE, we describe an architecture that extends an LLM to store all answered questions and their thoughts in a streaming experience pool then orchestrates helpful questions from the pool to assist in answering new questions. To set up a question-aware orchestration mechanism, RoSE first calculates the similarity of each question in the pool with a new test question. Since the solution to each answered question is not always correct, RoSE will sort the questions according to their similarity with the new question, and then uniformly divide them into multiple buckets. It finally extracts one question from each bucket to make these extracted questions more diverse. To make these extracted questions help RoSE answer new questions as much as possible, we introduce two other attributes of uncertainty and complexity for each question. RoSE will preferentially select the questions with low uncertainty and high complexity from each bucket. We evaluate the versatility of RoSE in various reasoning tasks, LLMs, and CoT methods.

Paper Structure

This paper contains 36 sections, 6 equations, 6 figures, 17 tables, 1 algorithm.

Figures (6)

  • Figure 1: The overview of RoSE
  • Figure 2: The relation between accuracy and the magnitude of uncertainty value on SVAMP dataset. We normalize the range of uncertainty to [0, 1].
  • Figure 3: The impact of each orchestration process.
  • Figure 4: The impact of complexity.
  • Figure 5: Results on different demonstration quantities.
  • ...and 1 more figures