Designing Instance-Level Sampling Schedules via REINFORCE with James-Stein Shrinkage

Peiyu Yu; Suraj Kothawade; Sirui Xie; Ying Nian Wu; Hongliang Fei

Designing Instance-Level Sampling Schedules via REINFORCE with James-Stein Shrinkage

Peiyu Yu, Suraj Kothawade, Sirui Xie, Ying Nian Wu, Hongliang Fei

TL;DR

This work introduces instance-level rescheduling for frozen text-to-image samplers by learning a single-shot, prompt- and seed-conditioned scheduling policy implemented as a Dirichlet distribution. To stabilize high-variance policy gradients in this high-dimensional setting, it derives a principled James-Stein shrinkage baseline that adaptively combines per-context and cross-context information, yielding lower estimator variance and better learning efficiency. Empirically, learned schedules improve text–image alignment, text rendering, and fine-grained control across multiple backbones and budgets, including competitive performance at only 5 steps without distillation. The approach offers a model-agnostic post-training lever that unlocks additional generative potential without altering the pretrained backbone, with demonstrated impact on few-step generation and downstream alignment tasks.

Abstract

Most post-training methods for text-to-image samplers focus on model weights: either fine-tuning the backbone for alignment or distilling it for few-step efficiency. We take a different route: rescheduling the sampling timeline of a frozen sampler. Instead of a fixed, global schedule, we learn instance-level (prompt- and noise-conditioned) schedules through a single-pass Dirichlet policy. To ensure accurate gradient estimates in high-dimensional policy learning, we introduce a novel reward baseline based on a principled James-Stein estimator; it provably achieves lower estimation errors than commonly used variants and leads to superior performance. Our rescheduled samplers consistently improve text-image alignment including text rendering and compositional control across modern Stable Diffusion and Flux model families. Additionally, a 5-step Flux-Dev sampler with our schedules can attain generation quality comparable to deliberately distilled samplers like Flux-Schnell. We thus position our scheduling framework as an emerging model-agnostic post-training lever that unlocks additional generative potential in pretrained samplers.

Designing Instance-Level Sampling Schedules via REINFORCE with James-Stein Shrinkage

TL;DR

Abstract

Designing Instance-Level Sampling Schedules via REINFORCE with James-Stein Shrinkage

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (4)