Table of Contents
Fetching ...

FR-TTS: Test-Time Scaling for NTP-based Image Generation with Effective Filling-based Reward Signal

Hang Xu, Linjiang Huang, Feng Zhao

TL;DR

This work addresses the challenge of applying test-time scaling to next-token prediction-based image generation by introducing Filling-Based Reward (FR), which estimates plausible future content for incomplete token sequences. FR-TTS combines an efficient FR upper-bound search, a diversity trajectory penalty, and a dynamic unified reward to guide intermediate samples toward high-reward regions. Empirical results across multiple baselines and benchmarks (TIIF-Bench, GenEval, Open-Image-Pref-v1) show that FR-TTS consistently surpasses Best-of-N and scales better than prior methods, with moderate inference-time overhead. The approach demonstrates improved adherence to prompts, better color fidelity, and more reliable long-text instruction following, highlighting the practical viability of reward-driven TTS in NTP-based generation.

Abstract

Test-time scaling (TTS) has become a prevalent technique in image generation, significantly boosting output quality by expanding the number of parallel samples and filtering them using pre-trained reward models. However, applying this powerful methodology to the next-token prediction (NTP) paradigm remains challenging. The primary obstacle is the low correlation between the reward of an image decoded from an intermediate token sequence and the reward of the fully generated image. Consequently, these incomplete intermediate representations prove to be poor indicators for guiding the pruning direction, a limitation that stems from their inherent incompleteness in scale or semantic content. To effectively address this critical issue, we introduce the Filling-Based Reward (FR). This novel design estimates the approximate future trajectory of an intermediate sample by finding and applying a reasonable filling scheme to complete the sequence. Both the correlation coefficient between rewards of intermediate samples and final samples, as well as multiple intrinsic signals like token confidence, indicate that the FR provides an excellent and reliable metric for accurately evaluating the quality of intermediate samples. Building upon this foundation, we propose FR-TTS, a sophisticated scaling strategy. FR-TTS efficiently searches for good filling schemes and incorporates a diversity reward with a dynamic weighting schedule to achieve a balanced and comprehensive evaluation of intermediate samples. We experimentally validate the superiority of FR-TTS over multiple established benchmarks and various reward models. Code is available at \href{https://github.com/xuhang07/FR-TTS}{https://github.com/xuhang07/FR-TTS}.

FR-TTS: Test-Time Scaling for NTP-based Image Generation with Effective Filling-based Reward Signal

TL;DR

This work addresses the challenge of applying test-time scaling to next-token prediction-based image generation by introducing Filling-Based Reward (FR), which estimates plausible future content for incomplete token sequences. FR-TTS combines an efficient FR upper-bound search, a diversity trajectory penalty, and a dynamic unified reward to guide intermediate samples toward high-reward regions. Empirical results across multiple baselines and benchmarks (TIIF-Bench, GenEval, Open-Image-Pref-v1) show that FR-TTS consistently surpasses Best-of-N and scales better than prior methods, with moderate inference-time overhead. The approach demonstrates improved adherence to prompts, better color fidelity, and more reliable long-text instruction following, highlighting the practical viability of reward-driven TTS in NTP-based generation.

Abstract

Test-time scaling (TTS) has become a prevalent technique in image generation, significantly boosting output quality by expanding the number of parallel samples and filtering them using pre-trained reward models. However, applying this powerful methodology to the next-token prediction (NTP) paradigm remains challenging. The primary obstacle is the low correlation between the reward of an image decoded from an intermediate token sequence and the reward of the fully generated image. Consequently, these incomplete intermediate representations prove to be poor indicators for guiding the pruning direction, a limitation that stems from their inherent incompleteness in scale or semantic content. To effectively address this critical issue, we introduce the Filling-Based Reward (FR). This novel design estimates the approximate future trajectory of an intermediate sample by finding and applying a reasonable filling scheme to complete the sequence. Both the correlation coefficient between rewards of intermediate samples and final samples, as well as multiple intrinsic signals like token confidence, indicate that the FR provides an excellent and reliable metric for accurately evaluating the quality of intermediate samples. Building upon this foundation, we propose FR-TTS, a sophisticated scaling strategy. FR-TTS efficiently searches for good filling schemes and incorporates a diversity reward with a dynamic weighting schedule to achieve a balanced and comprehensive evaluation of intermediate samples. We experimentally validate the superiority of FR-TTS over multiple established benchmarks and various reward models. Code is available at \href{https://github.com/xuhang07/FR-TTS}{https://github.com/xuhang07/FR-TTS}.

Paper Structure

This paper contains 35 sections, 15 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: FR-TTS helps the model generate high-quality images that better adhere to text prompts, as shown on the left side. FR-TTS significantly surpass the baseline and existing scaling strategies on both the TIIF-Bench (top right) and five commonly used reward models (bottom right), where we appropriately scaled the rewards of different strategies relative to the baseline's reward to facilitate comparison.
  • Figure 2: Comparison of the Spearman wissler1905spearman correlation scores between the reward of intermediate samples and the reward of final complete ones for different paradigms. For the NTP paradigm, we used two different reward calculation methods: Cropping the already generated part, ZeroPadding the ungenerated part, as shown in (a). The results in (b) show that the NTP paradigm's correlation score to the last step's is lower at all steps compared to other paradigms (e.g., Diffusion and VAR tian2024visual), where we evaluate 24 steps at equal intervals (13 steps evaluated at Infinity, which are equivalently mapped to the 24 steps).
  • Figure 3: We illustrate four aspects to show that the Filling-based reward (FR) is a highly effective metric for evaluating intermediate samples. We conduct experiments on Janus pro 7B chen2025janus with 7K+ prompts of Open-Image-Preferences-v1 OpenImagePreferencesV1Blog.(a) FR exhibits a higher correlation score, and the score continues to increase as the number of random filling times increases. (b) The filling scheme with a higher reward consistently shows lower entropy in the attention maps across different layers, which indicates that its semantics are more concentrated and less diffused ma2025towardschen2025go. (c) After several steps, there is a high correlation between the endogenous signal scores introduced by ScalingAR chen2025go of the generated tokens and FR. (d) The sample with a higher FR has a higher confidence on the ungenerated tokens.
  • Figure 4: Our proposed FR-TTS. Our scaling strategy is built upon three pivotal design principles that ensure efficiency and robust search: ① Efficient Search for the Upper Bound of Filling-based Reward (FR): We generate filling schemes multiple times to search for the upper bound of FR (we takes two best searches in middle steps as an example here), where details are shown in Fig. \ref{['fig:fr']}. ② Diversity Reward: To encourage broader exploration, we assign a diversity reward to each sample, calculated as one minus its maximum similarity score relative to previously generated samples. ③ Unified Reward: Based on the increasing correlation score of the FR over time in Fig. \ref{['fig:corr_score_2']}, we employ a dynamic weighting coefficient schedule. We further incorporate variance-based adjustments to the FR weights, allowing for enhanced differentiation in subsequent steps. Finally, we utilize Importance Sampling based on the unified reward to obtain new parallel samples.
  • Figure 5: Coarse-to-Fine Search Strategy for Efficient FR Scaling. Our approach for efficiently finding the upper bound of the Filling-based Reward (FR) utilizes a coarse-to-fine search strategy. Initially, we perform multiple block-wise random fillings to establish a base filling scheme via BoN selection. Building upon this base, we transition to Zero-Order ma2025scaling optimization, where we iteratively refill a small number of its blocks. If this local refinement yields a higher reward score, the new block configuration replaces the current base filling scheme; otherwise, the replacement is rejected. The reward of the final optimized base filling scheme then serves as the final FR for the intermediate sample.
  • ...and 1 more figures