FR-TTS: Test-Time Scaling for NTP-based Image Generation with Effective Filling-based Reward Signal
Hang Xu, Linjiang Huang, Feng Zhao
TL;DR
This work addresses the challenge of applying test-time scaling to next-token prediction-based image generation by introducing Filling-Based Reward (FR), which estimates plausible future content for incomplete token sequences. FR-TTS combines an efficient FR upper-bound search, a diversity trajectory penalty, and a dynamic unified reward to guide intermediate samples toward high-reward regions. Empirical results across multiple baselines and benchmarks (TIIF-Bench, GenEval, Open-Image-Pref-v1) show that FR-TTS consistently surpasses Best-of-N and scales better than prior methods, with moderate inference-time overhead. The approach demonstrates improved adherence to prompts, better color fidelity, and more reliable long-text instruction following, highlighting the practical viability of reward-driven TTS in NTP-based generation.
Abstract
Test-time scaling (TTS) has become a prevalent technique in image generation, significantly boosting output quality by expanding the number of parallel samples and filtering them using pre-trained reward models. However, applying this powerful methodology to the next-token prediction (NTP) paradigm remains challenging. The primary obstacle is the low correlation between the reward of an image decoded from an intermediate token sequence and the reward of the fully generated image. Consequently, these incomplete intermediate representations prove to be poor indicators for guiding the pruning direction, a limitation that stems from their inherent incompleteness in scale or semantic content. To effectively address this critical issue, we introduce the Filling-Based Reward (FR). This novel design estimates the approximate future trajectory of an intermediate sample by finding and applying a reasonable filling scheme to complete the sequence. Both the correlation coefficient between rewards of intermediate samples and final samples, as well as multiple intrinsic signals like token confidence, indicate that the FR provides an excellent and reliable metric for accurately evaluating the quality of intermediate samples. Building upon this foundation, we propose FR-TTS, a sophisticated scaling strategy. FR-TTS efficiently searches for good filling schemes and incorporates a diversity reward with a dynamic weighting schedule to achieve a balanced and comprehensive evaluation of intermediate samples. We experimentally validate the superiority of FR-TTS over multiple established benchmarks and various reward models. Code is available at \href{https://github.com/xuhang07/FR-TTS}{https://github.com/xuhang07/FR-TTS}.
