Table of Contents
Fetching ...

Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up

Jiahao Yuan, Dehui Du, Hao Zhang, Zixiang Di, Usman Naseem

TL;DR

This work tackles the trade-off between reasoning accuracy, flexibility, and cost in large language models by introducing Reversal of Thought (RoT). RoT combines a Preference-Guided Reverse Reasoning warm-up with a Cognitive Preference Manager to activate and adapt LLM cognitive preferences without retraining, expanding known knowledge boundaries and reducing hallucinations. Across eight tasks and five benchmarks, RoT outperforms strong baselines in reasoning accuracy while maintaining competitive efficiency, with ablations confirming the importance of PGRR, embedded logic, and CPM. By dynamically shaping task-specific prompts through reverse reasoning and knowledge-boundary-aware aggregation, RoT demonstrates practical potential for scalable, cost-conscious improvements in complex reasoning tasks. Future directions include integrating in-context learning strategies and teacher-student distillation to further boost robustness and generalization.

Abstract

Large language models (LLMs) have shown remarkable performance in reasoning tasks but face limitations in mathematical and complex logical reasoning. Existing methods to improve LLMs' logical capabilities either involve traceable or verifiable logical sequences that generate more reliable responses by constructing logical structures yet increase computational costs, or introduces rigid logic template rules, reducing flexibility. In this paper, we propose Reversal of Thought (RoT), a plug-and-play and cost-effective reasoning framework designed to enhance the logical reasoning abilities of LLMs during the warm-up phase prior to batch inference. RoT utilizes a Preference-Guided Reverse Reasoning warm-up strategy, which integrates logical symbols for pseudocode planning through meta-cognitive mechanisms and pairwise preference self-evaluation to generate task-specific prompts solely through demonstrations, aligning with LLMs' cognitive preferences shaped by RLHF. Through reverse reasoning, we utilize a Cognitive Preference Manager to assess knowledge boundaries and further expand LLMs' reasoning capabilities by aggregating solution logic for known tasks and stylistic templates for unknown tasks. Experiments across various tasks demonstrate that RoT surpasses existing baselines in both reasoning accuracy and efficiency.

Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up

TL;DR

This work tackles the trade-off between reasoning accuracy, flexibility, and cost in large language models by introducing Reversal of Thought (RoT). RoT combines a Preference-Guided Reverse Reasoning warm-up with a Cognitive Preference Manager to activate and adapt LLM cognitive preferences without retraining, expanding known knowledge boundaries and reducing hallucinations. Across eight tasks and five benchmarks, RoT outperforms strong baselines in reasoning accuracy while maintaining competitive efficiency, with ablations confirming the importance of PGRR, embedded logic, and CPM. By dynamically shaping task-specific prompts through reverse reasoning and knowledge-boundary-aware aggregation, RoT demonstrates practical potential for scalable, cost-conscious improvements in complex reasoning tasks. Future directions include integrating in-context learning strategies and teacher-student distillation to further boost robustness and generalization.

Abstract

Large language models (LLMs) have shown remarkable performance in reasoning tasks but face limitations in mathematical and complex logical reasoning. Existing methods to improve LLMs' logical capabilities either involve traceable or verifiable logical sequences that generate more reliable responses by constructing logical structures yet increase computational costs, or introduces rigid logic template rules, reducing flexibility. In this paper, we propose Reversal of Thought (RoT), a plug-and-play and cost-effective reasoning framework designed to enhance the logical reasoning abilities of LLMs during the warm-up phase prior to batch inference. RoT utilizes a Preference-Guided Reverse Reasoning warm-up strategy, which integrates logical symbols for pseudocode planning through meta-cognitive mechanisms and pairwise preference self-evaluation to generate task-specific prompts solely through demonstrations, aligning with LLMs' cognitive preferences shaped by RLHF. Through reverse reasoning, we utilize a Cognitive Preference Manager to assess knowledge boundaries and further expand LLMs' reasoning capabilities by aggregating solution logic for known tasks and stylistic templates for unknown tasks. Experiments across various tasks demonstrate that RoT surpasses existing baselines in both reasoning accuracy and efficiency.

Paper Structure

This paper contains 47 sections, 8 equations, 9 figures, 3 tables, 2 algorithms.

Figures (9)

  • Figure 1: Comparison between CoT yao2024treebesta2024graphyang2024buffer and Reversal of Thought (RoT)
  • Figure 2: Architecture of Reversal-of-Thought (RoT). RoT comprises two primary components: Preference Guided Reverse Reasoning, which enhances logical reasoning by activating LLMs' cognitive preferences, and Cognitive Preference Manager, which assesses knowledge boundaries and adapts cognitive styles for various tasks.
  • Figure 3: Inference time comparison, measured as the average duration from inference start to evaluation end, including all steps.
  • Figure 4: Prompt for Reverse Reasoning
  • Figure 5: Prompt for CPM (Known/Unknown)
  • ...and 4 more figures