Table of Contents
Fetching ...

RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling

Itay Chachy, Guy Yariv, Sagie Benaim

TL;DR

RewardSDS introduces a reward-weighted loss for SDS by weighting noise samples using alignment scores from pretrained reward models, and RewardVSD extends this to a particle-based variational distillation framework. The approach yields improved alignment and generation quality across zero-shot text-to-image, text-to-3D, and image editing tasks, validated against SDS and VSD baselines using reward models such as CLIPScore, ImageReward, and Aesthetic Score, with the LLM Grader providing human-aligned assessment. It is plug-and-play and compatible with existing SDS extensions, enabling finer control over user intent in diffusion-based generation, especially in data-scarce modalities like 3D. The results demonstrate scalable gains and characterize the time/quality tradeoffs of reward-weighted sampling.

Abstract

Score Distillation Sampling (SDS) has emerged as an effective technique for leveraging 2D diffusion priors for tasks such as text-to-3D generation. While powerful, SDS struggles with achieving fine-grained alignment to user intent. To overcome this, we introduce RewardSDS, a novel approach that weights noise samples based on alignment scores from a reward model, producing a weighted SDS loss. This loss prioritizes gradients from noise samples that yield aligned high-reward output. Our approach is broadly applicable and can extend SDS-based methods. In particular, we demonstrate its applicability to Variational Score Distillation (VSD) by introducing RewardVSD. We evaluate RewardSDS and RewardVSD on text-to-image, 2D editing, and text-to-3D generation tasks, showing significant improvements over SDS and VSD on a diverse set of metrics measuring generation quality and alignment to desired reward models, enabling state-of-the-art performance. Project page is available at https://itaychachy.github.io/reward-sds/.

RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling

TL;DR

RewardSDS introduces a reward-weighted loss for SDS by weighting noise samples using alignment scores from pretrained reward models, and RewardVSD extends this to a particle-based variational distillation framework. The approach yields improved alignment and generation quality across zero-shot text-to-image, text-to-3D, and image editing tasks, validated against SDS and VSD baselines using reward models such as CLIPScore, ImageReward, and Aesthetic Score, with the LLM Grader providing human-aligned assessment. It is plug-and-play and compatible with existing SDS extensions, enabling finer control over user intent in diffusion-based generation, especially in data-scarce modalities like 3D. The results demonstrate scalable gains and characterize the time/quality tradeoffs of reward-weighted sampling.

Abstract

Score Distillation Sampling (SDS) has emerged as an effective technique for leveraging 2D diffusion priors for tasks such as text-to-3D generation. While powerful, SDS struggles with achieving fine-grained alignment to user intent. To overcome this, we introduce RewardSDS, a novel approach that weights noise samples based on alignment scores from a reward model, producing a weighted SDS loss. This loss prioritizes gradients from noise samples that yield aligned high-reward output. Our approach is broadly applicable and can extend SDS-based methods. In particular, we demonstrate its applicability to Variational Score Distillation (VSD) by introducing RewardVSD. We evaluate RewardSDS and RewardVSD on text-to-image, 2D editing, and text-to-3D generation tasks, showing significant improvements over SDS and VSD on a diverse set of metrics measuring generation quality and alignment to desired reward models, enabling state-of-the-art performance. Project page is available at https://itaychachy.github.io/reward-sds/.

Paper Structure

This paper contains 16 sections, 9 equations, 13 figures, 8 tables.

Figures (13)

  • Figure 1: RewardSDS is a plug-and-play score distillation approach that allows for a reward-aligned generation. It can be applied to various tasks and extend diverse set of distillation approaches, boosting their performance and alignment. Here, we demonstrate it by replacing the standard SDS of the state-of-the-art MVDream shi2024mvdreammultiviewdiffusion3d approach with RewardSDS for text-to-3D generation.
  • Figure 2: RewardSDS illustration. An image is first rendered from a given view and $N$ random noises are applied (at a given timestep). The noisy images are then scored by denoising them and applying a reward model on the output. These scores are then mapped to corresponding weights, which are used to weigh the contribution of each noisy sample in score distillation.
  • Figure 3: Qualitative comparison of generated outputs using different reward models for RewardSDS and the SDS baseline.
  • Figure 4: Qualitative comparison of zero-shot text-to-image generation using SDS, RewardSDS (ours), VSD, RewardVDS (ours)
  • Figure 5: Qualitative comparison of text-to-3D generation based on (a) NeRF and (b) 3DGs, in comparison to MVDream.
  • ...and 8 more figures