Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation
Chaehun Shin, Jooyoung Choi, Johan Barthelemy, Jungbeom Lee, Sungroh Yoon
TL;DR
This paper tackles the challenge of preserving fine-grained subject details in zero-shot subject-driven text-to-image generation. It introduces Subject Fidelity Optimization (SFO), a comparison-based fine-tuning framework that uses synthetic negative targets generated via Condition-Degradation Negative Sampling (CDNS) and emphasizes mid-generation diffusion timesteps to sharpen subject fidelity while maintaining text alignment. The method is grounded in a Bradley-Terry-style objective that compares positives against negatives relative to a reference, and it includes a theoretical rationale linking to mutual information through a flow-matching surrogate. Empirical results on DreamBench show that SFO with CDNS outperforms strong baselines in subject fidelity and achieves competitive text alignment, with ablations validating the contributions of CDNS, timestepping, and degradation strategies for informative negative targets.
Abstract
We present Subject Fidelity Optimization (SFO), a novel comparative learning framework for zero-shot subject-driven generation that enhances subject fidelity. Existing supervised fine-tuning methods, which rely only on positive targets and use the diffusion loss as in the pre-training stage, often fail to capture fine-grained subject details. To address this, SFO introduces additional synthetic negative targets and explicitly guides the model to favor positives over negatives through pairwise comparison. For negative targets, we propose Condition-Degradation Negative Sampling (CDNS), which automatically produces synthetic negatives tailored for subject-driven generation by introducing controlled degradations that emphasize subject fidelity and text alignment without expensive human annotations. Moreover, we reweight the diffusion timesteps to focus fine-tuning on intermediate steps where subject details emerge. Extensive experiments demonstrate that SFO with CDNS significantly outperforms recent strong baselines in terms of both subject fidelity and text alignment on a subject-driven generation benchmark. Project page: https://subjectfidelityoptimization.github.io/
