Projected Gradient Ascent for Efficient Reward-Guided Updates with One-Step Generative Models
Jisung Hwang, Minhyuk Sung
TL;DR
The paper tackles the practicality of reward-guided generation with one-step generative models by addressing reward hacking and inefficiency in test-time latent optimization. It replaces soft regularization with hard white Gaussian noise constraints enforced via a closed-form projection onto a carefully designed feasible set, leveraging a bijective mapping to a compact spectral domain to enable an $O(N \log N)$ projection per iteration. Empirical results on one-step text-to-image models show higher target rewards and preserved human-aligned quality across multiple reward models, with substantially reduced wall-clock time compared to regularization-based baselines. The method also clarifies connections to prior regularization approaches, arguing that the hard-constraint formulation yields tighter Gaussian statistics and more reliable optimization, thereby making reward-guided generation more practical at deployment. Overall, the work provides a principled, efficient framework for robust test-time optimization in high-dimensional latent spaces with potential for broad applicability, while acknowledging safety considerations in reward design and deployment.
Abstract
We propose a constrained latent optimization method for reward-guided generation that preserves white Gaussian noise characteristics with negligible overhead. Test-time latent optimization can unlock substantially better reward-guided generations from pretrained generative models, but it is prone to reward hacking that degrades quality and also too slow for practical use. In this work, we make test-time optimization both efficient and reliable by replacing soft regularization with hard white Gaussian noise constraints enforced via projected gradient ascent. Our method applies a closed-form projection after each update to keep the latent vector explicitly noise-like throughout optimization, preventing the drift that leads to unrealistic artifacts. This enforcement adds minimal cost: the projection matches the $O(N \log N)$ complexity of standard algorithms such as sorting or FFT and does not practically increase wall-clock time. In experiments, our approach reaches a comparable Aesthetic Score using only 30% of the wall-clock time required by the SOTA regularization-based method, while preventing reward hacking.
