Imbalance in Balance: Online Concept Balancing in Generation Models
Yukai Shi, Jiarong Ou, Rui Chen, Haotian Yang, Jiahao Wang, Xin Tao, Pengfei Wan, Di Zhang, Kun Gai
TL;DR
The paper tackles unstable concept composition in text-to-image generation by analyzing causal factors and introducing an online, concept-wise balancing method. It introduces IMBA distance to quantify data distribution and IMBA loss to dynamically reweight concept regions during training, all without offline dataset pruning and with minimal code changes. A new benchmark, Inert-CompBench, targets inert concepts to stress-test compositional ability, alongside existing benchmarks. Empirical results show significant improvements in concept composition across multiple benchmarks, and ablation analyses validate the method’s efficiency, scalability, and compatibility with diffusion models. The work demonstrates that data distribution, rather than sheer scale or model size, is a primary determinant of composition quality at large scale, providing a practical, plug-and-play solution for robust concept synthesis in open-world generation tasks.
Abstract
In visual generation tasks, the responses and combinations of complex concepts often lack stability and are error-prone, which remains an under-explored area. In this paper, we attempt to explore the causal factors for poor concept responses through elaborately designed experiments. We also design a concept-wise equalization loss function (IMBA loss) to address this issue. Our proposed method is online, eliminating the need for offline dataset processing, and requires minimal code changes. In our newly proposed complex concept benchmark Inert-CompBench and two other public test sets, our method significantly enhances the concept response capability of baseline models and yields highly competitive results with only a few codes released at https://github.com/KwaiVGI/IMBA-Loss.
