Hybrid Discriminative Attribute-Object Embedding Network for Compositional Zero-Shot Learning
Yang Liu, Xinshuo Wang, Jiale Du, Xinbo Gao, Jungong Han
TL;DR
The paper tackles Compositional Zero-Shot Learning (CZSL) by addressing complex attribute–object interactions and long-tail data with a Hybrid Discriminative Attribute-Object Embedding (HDA-OE) network. It introduces Attribute-Driven Data Synthesis (ADDS) to diversify training attribute combinations and Subclass-Driven Discriminative Embedding (SDDE) to capture fine-grained subclass structure in embeddings, optimized via a combined objective $oldsymbol{L_{total}} = oldsymbol{\oldsymbol{\alpha}} oldsymbol{L_{base}} + oldsymbol{\boldsymbol{\beta}} oldsymbol{L_{emd}}$ and a joint feasibility score $C(a,o)$ in a shared space. The approach achieves state-of-the-art results on UT-Zappos, MIT-States, and C-GQA under both closed-world and open-world CZSL settings, with ablations confirming the contributions of ADDS and SDDE and their synergy. The work demonstrates improved generalization to unseen attribute–object compositions and robustness to data imbalance, supported by extensive experiments and qualitative retrieval analyses. Overall, HDA-OE offers a practical, scalable framework for reliable CZSL in diverse, real-world scenarios.
Abstract
Compositional Zero-Shot Learning (CZSL) recognizes new combinations by learning from known attribute-object pairs. However, the main challenge of this task lies in the complex interactions between attributes and object visual representations, which lead to significant differences in images. In addition, the long-tail label distribution in the real world makes the recognition task more complicated. To address these problems, we propose a novel method, named Hybrid Discriminative Attribute-Object Embedding (HDA-OE) network. To increase the variability of training data, HDA-OE introduces an attribute-driven data synthesis (ADDS) module. ADDS generates new samples with diverse attribute labels by combining multiple attributes of the same object. By expanding the attribute space in the dataset, the model is encouraged to learn and distinguish subtle differences between attributes. To further improve the discriminative ability of the model, HDA-OE introduces the subclass-driven discriminative embedding (SDDE) module, which enhances the subclass discriminative ability of the encoding by embedding subclass information in a fine-grained manner, helping to capture the complex dependencies between attributes and object visual features. The proposed model has been evaluated on three benchmark datasets, and the results verify its effectiveness and reliability.
