TTSnap: Test-Time Scaling of Diffusion Models via Noise-Aware Pruning
Qingtao Yu, Changlin Song, Minghao Sun, Zhengyang Yu, Vinay Kumar Verma, Soumya Roy, Sumit Negi, Hongdong Li, Dylan Campbell
TL;DR
The paper addresses the inefficiency of exploring many seeds for test-time diffusion-based image generation. It introduces TTSnap, a pruning framework that uses intermediate, noise-aware reward estimates to discard low-potential candidates early, combined with NARF to align rewards across noisy intermediate steps via self-distillation and curriculum training. The method yields substantial compute savings and performance gains, including improved reward growth under budget and compatibility with post-training and local optimization techniques. It emphasizes the importance of generation diversity and global search for effective test-time scaling in diffusion models.
Abstract
A prominent approach to test-time scaling for text-to-image diffusion models formulates the problem as a search over multiple noise seeds, selecting the one that maximizes a certain image-reward function. The effectiveness of this strategy heavily depends on the number and diversity of noise seeds explored. However, verifying each candidate is computationally expensive, because each must be fully denoised before a reward can be computed. This severely limits the number of samples that can be explored under a fixed budget. We propose test-time scaling with noise-aware pruning (TTSnap), a framework that prunes low-quality candidates without fully denoising them. The key challenge is that reward models are learned in the clean image domain, and the ranking of rewards predicted for intermediate estimates are often inconsistent with those predicted for clean images. To overcome this, we train noise-aware reward models via self-distillation to align the reward for intermediate estimates with that of the final clean images. To stabilize learning across different noise levels, we adopt a curriculum training strategy that progressively shifts the data domain from clean images to noise images. In addition, we introduce a new metric that measures reward alignment and computational budget utilization. Experiments demonstrate that our approach improves performance by over 16\% compared with existing methods, enabling more efficient and effective test-time scaling. It also provides orthogonal gains when combined with post-training techniques and local test-time optimization. Code: https://github.com/TerrysLearning/TTSnap/.
