SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning
Chen Li, Yinyi Luo, Anudeep Bolimera, Uzair Ahmed, Shri Kiran Srinivasan, Hrishikesh Gokhale, Marios Savvides
TL;DR
SOLAR addresses the limitation of fixed Chain-of-Thought reasoning in LLMs by enabling dynamic selection among CoT, ToT, and GoT topologies. It introduces TAG to automatically generate and annotate topology-aware data, and a hierarchical Topological-Scaling framework that merges post-training and inference-time strategies. The Multi-task Topological Reward Model (M-TRM) selects both topology and final answer in a single pass, achieving notable accuracy gains and stronger rank correlations than single-topology baselines. Empirically, SOLAR yields substantial improvements on MATH and GSM8K, including up to +10.02% accuracy and reduced output length, demonstrating a scalable path to high-precision, topology-aware reasoning in LLMs.
Abstract
Large Language Models excel in reasoning yet often rely on Chain-of-Thought prompts, limiting performance on tasks demanding more nuanced topological structures. We present SOLAR (Scalable Optimization of Large-scale Architecture for Reasoning), a framework that dynamically optimizes Chain-of-Thought (CoT), Tree-of-Thought (ToT), and Graph-of-Thought (GoT) topologies to boost accuracy and efficiency. Our Topological-Annotation-Generation (TAG) system automates dataset creation, annotation, and difficulty segmentation, leading to stronger post training and test-time performance. We also propose Topological-Scaling, a curriculum-learning-based approach that adaptively combines post training and inference scaling to each task. On MATH and GSM8K, SOLAR delivers notable gains: +5% accuracy with Topological Tuning, +9% with Topological Rewarding, and +10.02% with Hybrid Scaling, while reducing response length by over 5%, lowering inference latency. To further enhance efficiency, we introduce a multi-task Topological Reward Model (M-TRM) that selects both the optimal reasoning topology and final answer in a single pass, eliminating multiple single-task TRMs. Remarkably, M-TRM also surpasses all single-task TRMs, improving accuracy by +10% and rank correlation by +9%. Overall, SOLAR establishes a new benchmark for scalable, high-precision LLM reasoning and introduces a fully automated, dynamic topology competition mechanism.
