Attention Based Simple Primitives for Open World Compositional Zero-Shot Learning
Ans Munir, Faisal Z. Qureshi, Muhammad Haris Khan, Mohsen Ali
TL;DR
The paper tackles open-world compositional zero-shot learning by predicting attribute and object primitives independently while explicitly modeling their interactions with self-attention. It projects image and textual representations into separate attribute and object spaces and scores compositions via a multiplicative cosine-based fusion, augmented by ConceptNet-based feasibility to prune infeasible pairs. Empirical results on MIT-States, UT-Zappos, and CGQA demonstrate competitive or state-of-the-art performance, with notable gains on CGQA and robust open-world generalization. The work highlights the value of primitive-level attention and knowledge-guided feasibility for scalable, contextualized visual reasoning in zero-shot settings.
Abstract
Compositional Zero-Shot Learning (CZSL) aims to predict unknown compositions made up of attribute and object pairs. Predicting compositions unseen during training is a challenging task. We are exploring Open World Compositional Zero-Shot Learning (OW-CZSL) in this study, where our test space encompasses all potential combinations of attributes and objects. Our approach involves utilizing the self-attention mechanism between attributes and objects to achieve better generalization from seen to unseen compositions. Utilizing a self-attention mechanism facilitates the model's ability to identify relationships between attribute and objects. The similarity between the self-attended textual and visual features is subsequently calculated to generate predictions during the inference phase. The potential test space may encompass implausible object-attribute combinations arising from unrestricted attribute-object pairings. To mitigate this issue, we leverage external knowledge from ConceptNet to restrict the test space to realistic compositions. Our proposed model, Attention-based Simple Primitives (ASP), demonstrates competitive performance, achieving results comparable to the state-of-the-art.
