Table of Contents
Fetching ...

Context-based and Diversity-driven Specificity in Compositional Zero-Shot Learning

Yun Li, Zhe Liu, Hang Chen, Lina Yao

TL;DR

This work tackles CZSL by addressing attribute specificity and the open-world search space. It introduces CDS-CZSL, a 3-branch framework that fuses composition-wise context with primitive-wise attribute/object representations, enhanced by a context-based and diversity-driven specificity learner to prioritize informative, specific attribute descriptions. A denoising mechanism, batch-wise clustering, and a specificity-based penalty refine attribute predictions and enable effective composition filtering in Open-World CZSL. The approach achieves state-of-the-art results on three benchmarks in both Closed-World and Open-World settings, demonstrating improved generalization to unseen compositions and more discriminative attribute–object predictions. Overall, CDS-CZSL offers a principled way to incorporate information-theoretic specificity and contextual nuance into CZSL, with practical benefits for open-world visual reasoning and labeling.

Abstract

Compositional Zero-Shot Learning (CZSL) aims to recognize unseen attribute-object pairs based on a limited set of observed examples. Current CZSL methodologies, despite their advancements, tend to neglect the distinct specificity levels present in attributes. For instance, given images of sliced strawberries, they may fail to prioritize `Sliced-Strawberry' over a generic `Red-Strawberry', despite the former being more informative. They also suffer from ballooning search space when shifting from Close-World (CW) to Open-World (OW) CZSL. To address the issues, we introduce the Context-based and Diversity-driven Specificity learning framework for CZSL (CDS-CZSL). Our framework evaluates the specificity of attributes by considering the diversity of objects they apply to and their related context. This novel approach allows for more accurate predictions by emphasizing specific attribute-object pairs and improves composition filtering in OW-CZSL. We conduct experiments in both CW and OW scenarios, and our model achieves state-of-the-art results across three datasets.

Context-based and Diversity-driven Specificity in Compositional Zero-Shot Learning

TL;DR

This work tackles CZSL by addressing attribute specificity and the open-world search space. It introduces CDS-CZSL, a 3-branch framework that fuses composition-wise context with primitive-wise attribute/object representations, enhanced by a context-based and diversity-driven specificity learner to prioritize informative, specific attribute descriptions. A denoising mechanism, batch-wise clustering, and a specificity-based penalty refine attribute predictions and enable effective composition filtering in Open-World CZSL. The approach achieves state-of-the-art results on three benchmarks in both Closed-World and Open-World settings, demonstrating improved generalization to unseen compositions and more discriminative attribute–object predictions. Overall, CDS-CZSL offers a principled way to incorporate information-theoretic specificity and contextual nuance into CZSL, with practical benefits for open-world visual reasoning and labeling.

Abstract

Compositional Zero-Shot Learning (CZSL) aims to recognize unseen attribute-object pairs based on a limited set of observed examples. Current CZSL methodologies, despite their advancements, tend to neglect the distinct specificity levels present in attributes. For instance, given images of sliced strawberries, they may fail to prioritize `Sliced-Strawberry' over a generic `Red-Strawberry', despite the former being more informative. They also suffer from ballooning search space when shifting from Close-World (CW) to Open-World (OW) CZSL. To address the issues, we introduce the Context-based and Diversity-driven Specificity learning framework for CZSL (CDS-CZSL). Our framework evaluates the specificity of attributes by considering the diversity of objects they apply to and their related context. This novel approach allows for more accurate predictions by emphasizing specific attribute-object pairs and improves composition filtering in OW-CZSL. We conduct experiments in both CW and OW scenarios, and our model achieves state-of-the-art results across three datasets.
Paper Structure (16 sections, 17 equations, 3 figures, 5 tables)

This paper contains 16 sections, 17 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Specificity in CZSL. (a)For strawberries, Sliced is more specific than Red. Instead, Red is more specific than Writing for a pen, as Writing is its inherent function. (b) Clustering images based on their object features, Red spans multiple object clusters. In contrast, Sliced, though applicable to several objects, only links to the food cluster, indicating its greater specificity.
  • Figure 2: CDS-CZSL Overview and process of the context-based and diversity-driven specificity learning.
  • Figure 3: Qualitative Results of varying network structure (first row), changing filtering method (second row), and failure cases (third row).