Test-time Diverse Reasoning by Riemannian Activation Steering

Ly Tran Ho Khanh; Dongxuan Zhu; Man-Chung Yue; Viet Anh Nguyen

Test-time Diverse Reasoning by Riemannian Activation Steering

Ly Tran Ho Khanh, Dongxuan Zhu, Man-Chung Yue, Viet Anh Nguyen

TL;DR

The paper tackles the problem of limited output diversity in Best-of-$N$ reasoning for language models by introducing SPREAD, a test-time activation steering method. It casts steering as a Riemannian optimization on the product of spheres to maximize the volume spanned by intervened hidden activations, leveraging a log-determinant objective and a block-coordinate descent algorithm with exponential maps. The authors prove convergence properties and provide practical initialization and hyperparameter strategies, demonstrating strong gains in diversity and solution accuracy on mathematical benchmarks (e.g., AIME24, MATH500, OlympiadBench) with scalable inference-time costs. Overall, SPREAD offers a lightweight, parameter-efficient approach to enhance reasoning diversity without fine-tuning, with potential implications for robust multi-path problem solving in LMs.

Abstract

Best-of-$N$ reasoning improves the accuracy of language models in solving complex tasks by sampling multiple candidate solutions and then selecting the best one based on some criteria. A critical bottleneck for this strategy is the output diversity limit, which occurs when the model generates similar outputs despite stochastic sampling, and hence recites the same error. To address this lack of variance in reasoning paths, we propose a novel unsupervised activation steering strategy that simultaneously optimizes the steering vectors for multiple reasoning trajectories at test time. At any synchronization anchor along the batch generation process, we find the steering vectors that maximize the total volume spanned by all possible intervened activation subsets. We demonstrate that these steering vectors can be determined by solving a Riemannian optimization problem over the product of spheres with a log-determinant objective function. We then use a Riemannian block-coordinate descent algorithm with a well-tuned learning rate to obtain a stationary point of the problem, and we apply these steering vectors until the generation process reaches the subsequent synchronization anchor. Empirical evaluations on popular mathematical benchmarks demonstrate that our test-time Riemannian activation steering strategy outperforms vanilla sampling techniques in terms of generative diversity and solution accuracy.

Test-time Diverse Reasoning by Riemannian Activation Steering

TL;DR

The paper tackles the problem of limited output diversity in Best-of-

reasoning for language models by introducing SPREAD, a test-time activation steering method. It casts steering as a Riemannian optimization on the product of spheres to maximize the volume spanned by intervened hidden activations, leveraging a log-determinant objective and a block-coordinate descent algorithm with exponential maps. The authors prove convergence properties and provide practical initialization and hyperparameter strategies, demonstrating strong gains in diversity and solution accuracy on mathematical benchmarks (e.g., AIME24, MATH500, OlympiadBench) with scalable inference-time costs. Overall, SPREAD offers a lightweight, parameter-efficient approach to enhance reasoning diversity without fine-tuning, with potential implications for robust multi-path problem solving in LMs.

Abstract

Best-of-

reasoning improves the accuracy of language models in solving complex tasks by sampling multiple candidate solutions and then selecting the best one based on some criteria. A critical bottleneck for this strategy is the output diversity limit, which occurs when the model generates similar outputs despite stochastic sampling, and hence recites the same error. To address this lack of variance in reasoning paths, we propose a novel unsupervised activation steering strategy that simultaneously optimizes the steering vectors for multiple reasoning trajectories at test time. At any synchronization anchor along the batch generation process, we find the steering vectors that maximize the total volume spanned by all possible intervened activation subsets. We demonstrate that these steering vectors can be determined by solving a Riemannian optimization problem over the product of spheres with a log-determinant objective function. We then use a Riemannian block-coordinate descent algorithm with a well-tuned learning rate to obtain a stationary point of the problem, and we apply these steering vectors until the generation process reaches the subsequent synchronization anchor. Empirical evaluations on popular mathematical benchmarks demonstrate that our test-time Riemannian activation steering strategy outperforms vanilla sampling techniques in terms of generative diversity and solution accuracy.

Test-time Diverse Reasoning by Riemannian Activation Steering

TL;DR

Abstract

Test-time Diverse Reasoning by Riemannian Activation Steering

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (19)