RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling
Itay Chachy, Guy Yariv, Sagie Benaim
TL;DR
RewardSDS introduces a reward-weighted loss for SDS by weighting noise samples using alignment scores from pretrained reward models, and RewardVSD extends this to a particle-based variational distillation framework. The approach yields improved alignment and generation quality across zero-shot text-to-image, text-to-3D, and image editing tasks, validated against SDS and VSD baselines using reward models such as CLIPScore, ImageReward, and Aesthetic Score, with the LLM Grader providing human-aligned assessment. It is plug-and-play and compatible with existing SDS extensions, enabling finer control over user intent in diffusion-based generation, especially in data-scarce modalities like 3D. The results demonstrate scalable gains and characterize the time/quality tradeoffs of reward-weighted sampling.
Abstract
Score Distillation Sampling (SDS) has emerged as an effective technique for leveraging 2D diffusion priors for tasks such as text-to-3D generation. While powerful, SDS struggles with achieving fine-grained alignment to user intent. To overcome this, we introduce RewardSDS, a novel approach that weights noise samples based on alignment scores from a reward model, producing a weighted SDS loss. This loss prioritizes gradients from noise samples that yield aligned high-reward output. Our approach is broadly applicable and can extend SDS-based methods. In particular, we demonstrate its applicability to Variational Score Distillation (VSD) by introducing RewardVSD. We evaluate RewardSDS and RewardVSD on text-to-image, 2D editing, and text-to-3D generation tasks, showing significant improvements over SDS and VSD on a diverse set of metrics measuring generation quality and alignment to desired reward models, enabling state-of-the-art performance. Project page is available at https://itaychachy.github.io/reward-sds/.
