Table of Contents
Fetching ...

Attribute-formed Class-specific Concept Space: Endowing Language Bottleneck Model with Better Interpretability and Scalability

Jianyang Zhang, Qianli Luo, Guowu Yang, Wenjing Yang, Weide Liu, Guosheng Lin, Fengmao Lv

TL;DR

This paper tackles spurious cue inference and poor unseen-class generalization in Language Bottleneck Models by introducing ALBM, which uses an Attribute-formed Class-specific Concept Space (ACCS) built on a unified attribute set. It pairs ACCS with Visual Attribute Prompt Learning (VAPL) to capture fine-grained attribute features and a Description, Summary, and Supplement (DSS) strategy to automatically generate high-quality attribute sets via large language models. The approach yields improved interpretability, transferability, and competitive performance across nine fine-grained benchmarks, validated through extensive ablations and zero-shot/base-to-novel evaluations. Collectively, ALBM advances scalable, explainable image recognition by aligning concepts with causally essential attributes and leveraging cross-class correlations for better generalization.

Abstract

Language Bottleneck Models (LBMs) are proposed to achieve interpretable image recognition by classifying images based on textual concept bottlenecks. However, current LBMs simply list all concepts together as the bottleneck layer, leading to the spurious cue inference problem and cannot generalized to unseen classes. To address these limitations, we propose the Attribute-formed Language Bottleneck Model (ALBM). ALBM organizes concepts in the attribute-formed class-specific space, where concepts are descriptions of specific attributes for specific classes. In this way, ALBM can avoid the spurious cue inference problem by classifying solely based on the essential concepts of each class. In addition, the cross-class unified attribute set also ensures that the concept spaces of different classes have strong correlations, as a result, the learned concept classifier can be easily generalized to unseen classes. Moreover, to further improve interpretability, we propose Visual Attribute Prompt Learning (VAPL) to extract visual features on fine-grained attributes. Furthermore, to avoid labor-intensive concept annotation, we propose the Description, Summary, and Supplement (DSS) strategy to automatically generate high-quality concept sets with a complete and precise attribute. Extensive experiments on 9 widely used few-shot benchmarks demonstrate the interpretability, transferability, and performance of our approach. The code and collected concept sets are available at https://github.com/tiggers23/ALBM.

Attribute-formed Class-specific Concept Space: Endowing Language Bottleneck Model with Better Interpretability and Scalability

TL;DR

This paper tackles spurious cue inference and poor unseen-class generalization in Language Bottleneck Models by introducing ALBM, which uses an Attribute-formed Class-specific Concept Space (ACCS) built on a unified attribute set. It pairs ACCS with Visual Attribute Prompt Learning (VAPL) to capture fine-grained attribute features and a Description, Summary, and Supplement (DSS) strategy to automatically generate high-quality attribute sets via large language models. The approach yields improved interpretability, transferability, and competitive performance across nine fine-grained benchmarks, validated through extensive ablations and zero-shot/base-to-novel evaluations. Collectively, ALBM advances scalable, explainable image recognition by aligning concepts with causally essential attributes and leveraging cross-class correlations for better generalization.

Abstract

Language Bottleneck Models (LBMs) are proposed to achieve interpretable image recognition by classifying images based on textual concept bottlenecks. However, current LBMs simply list all concepts together as the bottleneck layer, leading to the spurious cue inference problem and cannot generalized to unseen classes. To address these limitations, we propose the Attribute-formed Language Bottleneck Model (ALBM). ALBM organizes concepts in the attribute-formed class-specific space, where concepts are descriptions of specific attributes for specific classes. In this way, ALBM can avoid the spurious cue inference problem by classifying solely based on the essential concepts of each class. In addition, the cross-class unified attribute set also ensures that the concept spaces of different classes have strong correlations, as a result, the learned concept classifier can be easily generalized to unseen classes. Moreover, to further improve interpretability, we propose Visual Attribute Prompt Learning (VAPL) to extract visual features on fine-grained attributes. Furthermore, to avoid labor-intensive concept annotation, we propose the Description, Summary, and Supplement (DSS) strategy to automatically generate high-quality concept sets with a complete and precise attribute. Extensive experiments on 9 widely used few-shot benchmarks demonstrate the interpretability, transferability, and performance of our approach. The code and collected concept sets are available at https://github.com/tiggers23/ALBM.

Paper Structure

This paper contains 22 sections, 14 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Illustration of the scenario for concept classification. (a) Existing Language Bottleneck Models DBLP:conf/cvpr/YangPZJCY23DBLP:conf/iccv/0003WZDHLWSM23 (b) Our Attribute-formed Language Bottleneck Model. Existing LBMs suffer spurious cue inference as they may make decisions based on non-essential or background concepts. Additionally, their cross-class scalability is also limited, as expanding the concept space may be necessary for unseen classes. On the contrary, our approach predicts classes solely based on their corresponding concepts to avoid the spurious cue problem, and also ensures the cross-category consistent concept space by sharing the unified attribute set, allowing transfer to unseen classes.
  • Figure 2: (a) Illustration of Visual Attribute Prompt Learning (VAPL). VAPL trains visual prompts representing the semantics of each attribute by aligning the output feature of these prompts with the textual features of corresponding concepts. (b) Illustration of the Description, Summary, and Supplement (DSS) strategy. DSS first prompts the LLM to generate concepts for each class, then summarizes the corresponding attributes for each concept, and finally supplements missing attribute descriptions for each class. (c) The overall architecture of the Attribute-formed Language Bottleneck Model, where $\otimes$ indicates matrix multiplication and $\odot$ indicates element-wise multiplication.
  • Figure 3: Case study of bottlenecks constructed by ALBM and LaBo, where red texts indicate spurious cues, scores indicate concept activations. The top three highest-weighted concepts for each category are shown. Categories and datasets are selected randomly.
  • Figure A1: Illustration of the summary step in DSS strategy. Specifically, we summarize the attributes of class concepts through the following steps: first, iteratively summarize the attributes by category; next, remove duplicate attributes from the attribute set; then, eliminate non-visual attributes; and finally, remove sparse attributes.
  • Figure A2: Few-shot performance comparison between our ALBM, LP-CLIP DBLP:conf/icml/RadfordKHRGASAM21, and LaBo DBLP:conf/cvpr/YangPZJCY23 on Food101, CUB, Aircraft, and Flowers102 datasets.
  • ...and 1 more figures