Table of Contents
Fetching ...

Beyond Fine-Tuning: A Systematic Study of Sampling Techniques in Personalized Image Generation

Vera Soboleva, Maksim Nakhodnov, Aibek Alanov

TL;DR

This work systematically analyzes sampling techniques for personalized image generation beyond traditional fine-tuning. It introduces Mixed, Switching, Masked, and other sampling strategies that combine concept and superclass trajectories, and provides a framework for evaluating their trade-offs in concept fidelity, prompt alignment, and efficiency. Through extensive experiments on DreamBooth and multiple diffusion backbones, the study demonstrates that superclass-informed sampling can substantially improve context adherence while preserving concept identity, with varying computational costs. The findings offer practical guidance for decoupling sampling from fine-tuning and inform method selection for diverse generative tasks, while also acknowledging limitations and ethical considerations for real-world deployment.

Abstract

Personalized text-to-image generation aims to create images tailored to user-defined concepts and textual descriptions. Balancing the fidelity of the learned concept with its ability for generation in various contexts presents a significant challenge. Existing methods often address this through diverse fine-tuning parameterizations and improved sampling strategies that integrate superclass trajectories during the diffusion process. While improved sampling offers a cost-effective, training-free solution for enhancing fine-tuned models, systematic analyses of these methods remain limited. Current approaches typically tie sampling strategies with fixed fine-tuning configurations, making it difficult to isolate their impact on generation outcomes. To address this issue, we systematically analyze sampling strategies beyond fine-tuning, exploring the impact of concept and superclass trajectories on the results. Building on this analysis, we propose a decision framework evaluating text alignment, computational constraints, and fidelity objectives to guide strategy selection. It integrates with diverse architectures and training approaches, systematically optimizing concept preservation, prompt adherence, and resource efficiency. The source code can be found at https://github.com/ControlGenAI/PersonGenSampler.

Beyond Fine-Tuning: A Systematic Study of Sampling Techniques in Personalized Image Generation

TL;DR

This work systematically analyzes sampling techniques for personalized image generation beyond traditional fine-tuning. It introduces Mixed, Switching, Masked, and other sampling strategies that combine concept and superclass trajectories, and provides a framework for evaluating their trade-offs in concept fidelity, prompt alignment, and efficiency. Through extensive experiments on DreamBooth and multiple diffusion backbones, the study demonstrates that superclass-informed sampling can substantially improve context adherence while preserving concept identity, with varying computational costs. The findings offer practical guidance for decoupling sampling from fine-tuning and inform method selection for diverse generative tasks, while also acknowledging limitations and ethical considerations for real-world deployment.

Abstract

Personalized text-to-image generation aims to create images tailored to user-defined concepts and textual descriptions. Balancing the fidelity of the learned concept with its ability for generation in various contexts presents a significant challenge. Existing methods often address this through diverse fine-tuning parameterizations and improved sampling strategies that integrate superclass trajectories during the diffusion process. While improved sampling offers a cost-effective, training-free solution for enhancing fine-tuned models, systematic analyses of these methods remain limited. Current approaches typically tie sampling strategies with fixed fine-tuning configurations, making it difficult to isolate their impact on generation outcomes. To address this issue, we systematically analyze sampling strategies beyond fine-tuning, exploring the impact of concept and superclass trajectories on the results. Building on this analysis, we propose a decision framework evaluating text alignment, computational constraints, and fidelity objectives to guide strategy selection. It integrates with diverse architectures and training approaches, systematically optimizing concept preservation, prompt adherence, and resource efficiency. The source code can be found at https://github.com/ControlGenAI/PersonGenSampler.

Paper Structure

This paper contains 21 sections, 11 equations, 25 figures, 1 table.

Figures (25)

  • Figure 1: Visualization of Different Sampling Strategies. (a) Usual sampling with concept reproduces the concept but does not align closely with the text prompt. (b) Generation with superclass effectively captures the context obtained from the prompt but produces a random superclass representative (e.g., dog). (c-d) Mixed and Switching sampling strategies improve context preservation while maintaining the concept's identity.
  • Figure 2: Effects of Superclass Influence on Different Sampling Methods. For Mixed Sampling, the influence is adjusted by varying the superclass guidance scale $\omega_s$ with $\omega_c = 7.0 - \omega_s$. For Switching Sampling, we vary the switching step $t_{sw}$ . For Masked Sampling, the mask is modified by altering the concept mask thresholding quantile $q$.
  • Figure 3: Pareto Frontier curves for Mixed, Switching and Multi-stage Sampling methods. Each Multi-stage sampling curve is generated by fixing the switching step while varying the superclass guidance scale $\omega_s = [1.0, 3.0, 5.0]$.
  • Figure 4: Pareto frontiers curves for Masked sampling. Each Masked sampling curve is derived by varying the quantile $q = [ 0.3, 0.5, 0.7, 0.9 ]$, which controls the mask binarization threshold; $t_{sw} = 3, \omega_s = 3.5$ are fixed.
  • Figure 5: Mixed sampling Pareto frontiers for different fine-tuning methods.
  • ...and 20 more figures