Mode-Conditioning Unlocks Superior Test-Time Scaling

Chen Henry Wu; Sachin Goyal; Aditi Raghunathan

Mode-Conditioning Unlocks Superior Test-Time Scaling

Chen Henry Wu, Sachin Goyal, Aditi Raghunathan

TL;DR

The paper addresses diversity collapse in parallel test-time sampling for large language models and introduces Mode-conditioning (ModC), which allocates inference compute across distinct reasoning modes. It presents two practical training instantiations—specialist models and mode-specific prefixes—and an automated mode-discovery approach via gradient clustering. Across controlled Countdown tasks, large-scale math reasoning (OpenThoughts, NuminaMath), and reinforcement learning, ModC yields up to 4× inference efficiency and meaningful Pass@$k$ gains, while also improving the frontier of achievable solutions. The work demonstrates that standard training underutilizes data diversity and provides a scalable, versatile remedy for unlocking robust test-time scaling.

Abstract

Parallel sampling promises substantial gains in test-time scaling, but its effectiveness is sharply limited by diversity collapse, where models concentrate on a few modes and repeated samples produce the same mistakes. We propose the mode-conditioning (ModC) framework, which explicitly allocates test-time compute across reasoning modes using either specialist models or mode-specific prefixes. ModC consistently improves scaling across controlled graph-search tasks and large-scale reasoning benchmarks, spanning model families and sizes from 0.5B to 7B. On OpenThoughts, fine-tuning Qwen2.5-7B with ModC achieves a 4x efficiency gain over standard training while also improving the maximum attainable Pass@k. We further show that gradient clustering enables ModC without explicit mode labels, yielding up to 10% gains on datasets such as NuminaMath. Finally, we show that ModC improves reinforcement learning (RL) and can further boost diversity-inducing RL methods. These results demonstrate that standard training underutilizes the diversity in data, and that ModC provides a simple, effective remedy for unlocking the full benefits of diversity in test-time scaling.

Mode-Conditioning Unlocks Superior Test-Time Scaling

TL;DR

Abstract

Mode-Conditioning Unlocks Superior Test-Time Scaling

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)