CFG-EC: Error Correction Classifier-Free Guidance
Nakkyu Yang, Yechan Lee, SooJean Han
TL;DR
CFG-EC tackles the training-sampling mismatch in classifier-free guidance by proactively correcting the unconditional noise prediction. It uses Gram-Schmidt orthogonalization to make the unconditional error orthogonal to the conditional error, thereby eliminating the inner-product term that degrades sampling quality and tightening the error bound. Empirical results on SDXL/SD1.5 with MSCOCO show improved FID and CLIP, especially at low guidance, with a dynamic variant offering the best balance between fidelity and prompt alignment. The approach is versatile and can augment CFG-based methods, providing a robust path toward higher-fidelity, more text-aligned image generation.
Abstract
Classifier-Free Guidance (CFG) has become a mainstream approach for simultaneously improving prompt fidelity and generation quality in conditional generative models. During training, CFG stochastically alternates between conditional and null prompts to enable both conditional and unconditional generation. However, during sampling, CFG outputs both null and conditional prompts simultaneously, leading to inconsistent noise estimates between the training and sampling processes. To reduce this error, we propose CFG-EC, a versatile correction scheme augmentable to any CFG-based method by refining the unconditional noise predictions. CFG-EC actively realigns the unconditional noise error component to be orthogonal to the conditional error component. This corrective maneuver prevents interference between the two guidance components, thereby constraining the sampling error's upper bound and establishing more reliable guidance trajectories for high-fidelity image generation. Our numerical experiments show that CFG-EC handles the unconditional component more effectively than CFG and CFG++, delivering a marked performance increase in the low guidance sampling regime and consistently higher prompt alignment across the board.
