Table of Contents
Fetching ...

Psi-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models

Taehoon Yoon, Yunhong Min, Kyeongmin Yeo, Minhyuk Sung

TL;DR

PSI-Sampler addresses the inefficiency of inference-time reward alignment in score-based generative models by initializing particles from the reward-aware posterior $\tilde{p}_1^*(\boldsymbol{x}_1)$ rather than the Gaussian prior $p_1$. It employs the preconditioned Crank–Nicolson Langevin (pCNL) sampler to draw samples from this posterior and then feeds them into Sequential Monte Carlo with the approximately optimal transition kernel $\tilde{p}_{\theta}^*$ to target $p_0^*$, using Tweedie’s formula within a stochastic optimal control framework. Through experiments on layout-to-image, quantity-aware generation, and aesthetic-preference generation, it demonstrates consistent improvements in both seen and held-out rewards over Gaussian-prior baselines and other posterior initializations. The results show that reward-informed initialization yields better exploration and sample quality under fixed compute budgets, with practical implications for reward-aligned generation in high-dimensional score-based models.

Abstract

We introduce $Ψ$-Sampler, an SMC-based framework incorporating pCNL-based initial particle sampling for effective inference-time reward alignment with a score-based generative model. Inference-time reward alignment with score-based generative models has recently gained significant traction, following a broader paradigm shift from pre-training to post-training optimization. At the core of this trend is the application of Sequential Monte Carlo (SMC) to the denoising process. However, existing methods typically initialize particles from the Gaussian prior, which inadequately captures reward-relevant regions and results in reduced sampling efficiency. We demonstrate that initializing from the reward-aware posterior significantly improves alignment performance. To enable posterior sampling in high-dimensional latent spaces, we introduce the preconditioned Crank-Nicolson Langevin (pCNL) algorithm, which combines dimension-robust proposals with gradient-informed dynamics. This approach enables efficient and scalable posterior sampling and consistently improves performance across various reward alignment tasks, including layout-to-image generation, quantity-aware generation, and aesthetic-preference generation, as demonstrated in our experiments. Project Webpage: https://psi-sampler.github.io/

Psi-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models

TL;DR

PSI-Sampler addresses the inefficiency of inference-time reward alignment in score-based generative models by initializing particles from the reward-aware posterior rather than the Gaussian prior . It employs the preconditioned Crank–Nicolson Langevin (pCNL) sampler to draw samples from this posterior and then feeds them into Sequential Monte Carlo with the approximately optimal transition kernel to target , using Tweedie’s formula within a stochastic optimal control framework. Through experiments on layout-to-image, quantity-aware generation, and aesthetic-preference generation, it demonstrates consistent improvements in both seen and held-out rewards over Gaussian-prior baselines and other posterior initializations. The results show that reward-informed initialization yields better exploration and sample quality under fixed compute budgets, with practical implications for reward-aligned generation in high-dimensional score-based models.

Abstract

We introduce -Sampler, an SMC-based framework incorporating pCNL-based initial particle sampling for effective inference-time reward alignment with a score-based generative model. Inference-time reward alignment with score-based generative models has recently gained significant traction, following a broader paradigm shift from pre-training to post-training optimization. At the core of this trend is the application of Sequential Monte Carlo (SMC) to the denoising process. However, existing methods typically initialize particles from the Gaussian prior, which inadequately captures reward-relevant regions and results in reduced sampling efficiency. We demonstrate that initializing from the reward-aware posterior significantly improves alignment performance. To enable posterior sampling in high-dimensional latent spaces, we introduce the preconditioned Crank-Nicolson Langevin (pCNL) algorithm, which combines dimension-robust proposals with gradient-informed dynamics. This approach enables efficient and scalable posterior sampling and consistently improves performance across various reward alignment tasks, including layout-to-image generation, quantity-aware generation, and aesthetic-preference generation, as demonstrated in our experiments. Project Webpage: https://psi-sampler.github.io/

Paper Structure

This paper contains 43 sections, 43 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Toy sampling–method comparison. Each panel visualizes both the initial samples (blue) and their corresponding clean data samples (red). From left to right: (A) samples from the original score-based generative model; (B) the target distribution defined by Eq. \ref{['eq:problem definition and related work:target distribution at t=0']}; (C) results from SMC; (D) results from MALA+SMC; and (E) results from our proposed $\Psi$-Sampler.
  • Figure 2: Qualitative results for each application demonstrate that $\Psi$-Sampler consistently generates images aligned with the given conditions. Detailed analysis of each case is provided in Sec. \ref{['sec:experiments:qualitative']}.
  • Figure 3: Performance comparison of MALA and pCNL across different evaluation metrics with varying step sizes. Conducted on layout-to-image generation application.
  • Figure 4: Performance comparison of MALA and pCNL across different evaluation metrics under two generation settings: quantity-aware generation (top row) and aesthetic-preference generation (bottom row). Each graph illustrates the performance trend with varying step sizes.
  • Figure 5: Qualitative results for each application on SANA-Sprint chen2025sanasprint.
  • ...and 4 more figures

Theorems & Definitions (3)

  • Remark 1
  • proof
  • Remark 2