Table of Contents
Fetching ...

SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning

Chen Li, Yinyi Luo, Anudeep Bolimera, Uzair Ahmed, Shri Kiran Srinivasan, Hrishikesh Gokhale, Marios Savvides

TL;DR

SOLAR addresses the limitation of fixed Chain-of-Thought reasoning in LLMs by enabling dynamic selection among CoT, ToT, and GoT topologies. It introduces TAG to automatically generate and annotate topology-aware data, and a hierarchical Topological-Scaling framework that merges post-training and inference-time strategies. The Multi-task Topological Reward Model (M-TRM) selects both topology and final answer in a single pass, achieving notable accuracy gains and stronger rank correlations than single-topology baselines. Empirically, SOLAR yields substantial improvements on MATH and GSM8K, including up to +10.02% accuracy and reduced output length, demonstrating a scalable path to high-precision, topology-aware reasoning in LLMs.

Abstract

Large Language Models excel in reasoning yet often rely on Chain-of-Thought prompts, limiting performance on tasks demanding more nuanced topological structures. We present SOLAR (Scalable Optimization of Large-scale Architecture for Reasoning), a framework that dynamically optimizes Chain-of-Thought (CoT), Tree-of-Thought (ToT), and Graph-of-Thought (GoT) topologies to boost accuracy and efficiency. Our Topological-Annotation-Generation (TAG) system automates dataset creation, annotation, and difficulty segmentation, leading to stronger post training and test-time performance. We also propose Topological-Scaling, a curriculum-learning-based approach that adaptively combines post training and inference scaling to each task. On MATH and GSM8K, SOLAR delivers notable gains: +5% accuracy with Topological Tuning, +9% with Topological Rewarding, and +10.02% with Hybrid Scaling, while reducing response length by over 5%, lowering inference latency. To further enhance efficiency, we introduce a multi-task Topological Reward Model (M-TRM) that selects both the optimal reasoning topology and final answer in a single pass, eliminating multiple single-task TRMs. Remarkably, M-TRM also surpasses all single-task TRMs, improving accuracy by +10% and rank correlation by +9%. Overall, SOLAR establishes a new benchmark for scalable, high-precision LLM reasoning and introduces a fully automated, dynamic topology competition mechanism.

SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning

TL;DR

SOLAR addresses the limitation of fixed Chain-of-Thought reasoning in LLMs by enabling dynamic selection among CoT, ToT, and GoT topologies. It introduces TAG to automatically generate and annotate topology-aware data, and a hierarchical Topological-Scaling framework that merges post-training and inference-time strategies. The Multi-task Topological Reward Model (M-TRM) selects both topology and final answer in a single pass, achieving notable accuracy gains and stronger rank correlations than single-topology baselines. Empirically, SOLAR yields substantial improvements on MATH and GSM8K, including up to +10.02% accuracy and reduced output length, demonstrating a scalable path to high-precision, topology-aware reasoning in LLMs.

Abstract

Large Language Models excel in reasoning yet often rely on Chain-of-Thought prompts, limiting performance on tasks demanding more nuanced topological structures. We present SOLAR (Scalable Optimization of Large-scale Architecture for Reasoning), a framework that dynamically optimizes Chain-of-Thought (CoT), Tree-of-Thought (ToT), and Graph-of-Thought (GoT) topologies to boost accuracy and efficiency. Our Topological-Annotation-Generation (TAG) system automates dataset creation, annotation, and difficulty segmentation, leading to stronger post training and test-time performance. We also propose Topological-Scaling, a curriculum-learning-based approach that adaptively combines post training and inference scaling to each task. On MATH and GSM8K, SOLAR delivers notable gains: +5% accuracy with Topological Tuning, +9% with Topological Rewarding, and +10.02% with Hybrid Scaling, while reducing response length by over 5%, lowering inference latency. To further enhance efficiency, we introduce a multi-task Topological Reward Model (M-TRM) that selects both the optimal reasoning topology and final answer in a single pass, eliminating multiple single-task TRMs. Remarkably, M-TRM also surpasses all single-task TRMs, improving accuracy by +10% and rank correlation by +9%. Overall, SOLAR establishes a new benchmark for scalable, high-precision LLM reasoning and introduces a fully automated, dynamic topology competition mechanism.

Paper Structure

This paper contains 37 sections, 6 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: SOLAR Architecture
  • Figure 2: Accuracy comparisons across existing pretrained models reveal that the less frequently generated ToT and GoT topologies perform on par with the default CoT method, indicating that neither ToT nor GoT is lagging behind in performance.
  • Figure 3: Win Rate comparisons across pretrained models demonstrate that different tasks favor different reasoning topologies, as evidenced by distinct win-rate distributions. This finding underscores the potential to enhance LLM reasoning by explicitly augmenting them with optimal topological strategies.
  • Figure 4: Topological Tuning Results Overall: Improvements in overall accuracy and reduction in generated length are observed from topo-tuned model.
  • Figure 5: Topo-wise Win Rate Comparison
  • ...and 3 more figures