Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation
Siteng Huang, Biao Gong, Yutong Feng, Xi Chen, Yuqian Fu, Yu Liu, Donglin Wang
TL;DR
The paper tackles action customization in text-to-image generation by learning action-specific identifiers that disentangle actions from appearance. It introduces Action-Disentangled Identifier (ADI), which extends semantic conditioning with layer-wise tokens and uses gradient masking across context-different and action-different pairs to block action-agnostic features from leaking into the learned action representations. A new ActionBench benchmark is proposed to evaluate action fidelity and subject consistency across diverse actions and unseen subjects, including animals. Empirical results show ADI achieves superior action accuracy and preserves subject appearance, outperforming strong baselines and demonstrating practical potential for flexible, action-focused image synthesis.
Abstract
This study focuses on a novel task in text-to-image (T2I) generation, namely action customization. The objective of this task is to learn the co-existing action from limited data and generalize it to unseen humans or even animals. Experimental results show that existing subject-driven customization methods fail to learn the representative characteristics of actions and struggle in decoupling actions from context features, including appearance. To overcome the preference for low-level features and the entanglement of high-level features, we propose an inversion-based method Action-Disentangled Identifier (ADI) to learn action-specific identifiers from the exemplar images. ADI first expands the semantic conditioning space by introducing layer-wise identifier tokens, thereby increasing the representational richness while distributing the inversion across different features. Then, to block the inversion of action-agnostic features, ADI extracts the gradient invariance from the constructed sample triples and masks the updates of irrelevant channels. To comprehensively evaluate the task, we present an ActionBench that includes a variety of actions, each accompanied by meticulously selected samples. Both quantitative and qualitative results show that our ADI outperforms existing baselines in action-customized T2I generation. Our project page is at https://adi-t2i.github.io/ADI.
