Table of Contents
Fetching ...

Object Centric Concept Bottlenecks

David Steinmann, Wolfgang Stammer, Antonia Wüst, Kristian Kersting

TL;DR

OCB addresses the limitations of holistic image encodings in concept bottleneck models by introducing object-centric concept bottlenecks that fuse object proposals with concept discovery and a linear predictor. It extends CBMs to multi-label and logic-based single-label reasoning using the COCOLogic benchmark, demonstrating improved accuracy and interpretability over traditional CBMs. The work provides thorough ablative analyses of aggregation methods, object proposals, and the necessity of combining global and object-level features, offering practical guidance and a new challenging dataset for structured visual reasoning. Together, these contributions advance interpretable, object-aware visual reasoning with scalable, pretrained components.

Abstract

Developing high-performing, yet interpretable models remains a critical challenge in modern AI. Concept-based models (CBMs) attempt to address this by extracting human-understandable concepts from a global encoding (e.g., image encoding) and then applying a linear classifier on the resulting concept activations, enabling transparent decision-making. However, their reliance on holistic image encodings limits their expressiveness in object-centric real-world settings and thus hinders their ability to solve complex vision tasks beyond single-label classification. To tackle these challenges, we introduce Object-Centric Concept Bottlenecks (OCB), a framework that combines the strengths of CBMs and pre-trained object-centric foundation models, boosting performance and interpretability. We evaluate OCB on complex image datasets and conduct a comprehensive ablation study to analyze key components of the framework, such as strategies for aggregating object-concept encodings. The results show that OCB outperforms traditional CBMs and allows one to make interpretable decisions for complex visual tasks.

Object Centric Concept Bottlenecks

TL;DR

OCB addresses the limitations of holistic image encodings in concept bottleneck models by introducing object-centric concept bottlenecks that fuse object proposals with concept discovery and a linear predictor. It extends CBMs to multi-label and logic-based single-label reasoning using the COCOLogic benchmark, demonstrating improved accuracy and interpretability over traditional CBMs. The work provides thorough ablative analyses of aggregation methods, object proposals, and the necessity of combining global and object-level features, offering practical guidance and a new challenging dataset for structured visual reasoning. Together, these contributions advance interpretable, object-aware visual reasoning with scalable, pretrained components.

Abstract

Developing high-performing, yet interpretable models remains a critical challenge in modern AI. Concept-based models (CBMs) attempt to address this by extracting human-understandable concepts from a global encoding (e.g., image encoding) and then applying a linear classifier on the resulting concept activations, enabling transparent decision-making. However, their reliance on holistic image encodings limits their expressiveness in object-centric real-world settings and thus hinders their ability to solve complex vision tasks beyond single-label classification. To tackle these challenges, we introduce Object-Centric Concept Bottlenecks (OCB), a framework that combines the strengths of CBMs and pre-trained object-centric foundation models, boosting performance and interpretability. We evaluate OCB on complex image datasets and conduct a comprehensive ablation study to analyze key components of the framework, such as strategies for aggregating object-concept encodings. The results show that OCB outperforms traditional CBMs and allows one to make interpretable decisions for complex visual tasks.

Paper Structure

This paper contains 21 sections, 2 equations, 9 figures, 10 tables, 1 algorithm.

Figures (9)

  • Figure 1: Reasoning and explaining on the object-level requires object representations.
  • Figure 2: Object-Centric Concept Bottlenecks combine object-centric representations with concept-based modeling in a three-stage pipeline: (I) An object proposal module identifies and refines object candidates within an image. (II) A concept discovery module encodes the entire image and its object crops into human-understandable concept activations. (III) These activations are aggregated and passed to a simple, interpretable predictor to generate the final output. This architecture enables interpretable, object-aware reasoning for complex visual tasks.
  • Figure 3: Two examples from the COCOLogic dataset with relevant objects for class decision.
  • Figure 4: Different datasets benefit from different aggregation methods. While the performance comparison of different aggregation strategies with OCB (RCNN) and $k = 7$ does not show a one-fits-all choice, max or sum are always solid choices.
  • Figure 5: Adding more object proposals improves performance on object-based tasks. However, for complex tasks like COCOLogic, adding to many (noisy) object proposals can result in reduced performance. OCB (RCNN) performance for different values of $k$.
  • ...and 4 more figures