Table of Contents
Fetching ...

MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning

Shuo Xu, Sai Wang, Xinyue Hu, Yutian Lin, Sibei Yang, Yu Wu

TL;DR

This work introduces the Multi-Attribute Composition (MAC) dataset, a robust baseline for multi-attribute CZSL and proposes Multi-attribute Visual-Primitive Integrator (MVP-Integrator), a robust baseline for multi-attribute CZSL which disentangles semantic primitives and performs effective visual-primitive association.

Abstract

Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen compositions and recognize unseen attribute-object compositions. Existing CZSL datasets focus on single attributes, neglecting the fact that objects naturally exhibit multiple interrelated attributes. Their narrow attribute scope and single attribute labeling introduce annotation biases, misleading the learning of attributes and causing inaccurate evaluation. To address these issues, we introduce the Multi-Attribute Composition (MAC) dataset, encompassing 22,838 images and 17,627 compositions with comprehensive and representative attribute annotations. MAC shows complex relationship between attributes and objects, with each attribute type linked to an average of 82.2 object types, and each object type associated with 31.4 attribute types. Based on MAC, we propose multi-attribute compositional zero-shot learning that requires deeper semantic understanding and advanced attribute associations, establishing a more realistic and challenging benchmark for CZSL. We also propose Multi-attribute Visual-Primitive Integrator (MVP-Integrator), a robust baseline for multi-attribute CZSL, which disentangles semantic primitives and performs effective visual-primitive association. Experimental results demonstrate that MVP-Integrator significantly outperforms existing CZSL methods on MAC with improved inference efficiency.

MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning

TL;DR

This work introduces the Multi-Attribute Composition (MAC) dataset, a robust baseline for multi-attribute CZSL and proposes Multi-attribute Visual-Primitive Integrator (MVP-Integrator), a robust baseline for multi-attribute CZSL which disentangles semantic primitives and performs effective visual-primitive association.

Abstract

Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen compositions and recognize unseen attribute-object compositions. Existing CZSL datasets focus on single attributes, neglecting the fact that objects naturally exhibit multiple interrelated attributes. Their narrow attribute scope and single attribute labeling introduce annotation biases, misleading the learning of attributes and causing inaccurate evaluation. To address these issues, we introduce the Multi-Attribute Composition (MAC) dataset, encompassing 22,838 images and 17,627 compositions with comprehensive and representative attribute annotations. MAC shows complex relationship between attributes and objects, with each attribute type linked to an average of 82.2 object types, and each object type associated with 31.4 attribute types. Based on MAC, we propose multi-attribute compositional zero-shot learning that requires deeper semantic understanding and advanced attribute associations, establishing a more realistic and challenging benchmark for CZSL. We also propose Multi-attribute Visual-Primitive Integrator (MVP-Integrator), a robust baseline for multi-attribute CZSL, which disentangles semantic primitives and performs effective visual-primitive association. Experimental results demonstrate that MVP-Integrator significantly outperforms existing CZSL methods on MAC with improved inference efficiency.
Paper Structure (19 sections, 6 equations, 9 figures, 7 tables)

This paper contains 19 sections, 6 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: (a) Compared to previous datasets. MAC provides comprehensive and representative attributes for the objects. (b) Multiple attribute compositional zero-shot learning.
  • Figure 2: Dataset statistics. (a) shows the proportion of images with different numbers of attribute labels; (b) shows the co-occurrence of the top 15 attributes with their most frequently associated attributes; (c) illustrates the binding relationships between attributes and objects. The top section displays the distribution of attributes across varying numbers of associated objects, while the bottom section presents the reverse; (d) displays the number of images per primitive for MAC.
  • Figure 3: Examples of MAC. Our dataset provides comprehensive and representative attribute annotations for images.
  • Figure 4: (a) Step-by-step diagram of dataset construction. (b) Samples comparison of different datasets. The images from top left to bottom right are UT-Zappos yu2014fine, MIT-States isola2015discovering, C-GQA naeem2021learning, and MAC.
  • Figure 5: Complementarity analysis. The darker the color, the greater the proportion of all predictions.
  • ...and 4 more figures