Object Centric Concept Bottlenecks
David Steinmann, Wolfgang Stammer, Antonia Wüst, Kristian Kersting
TL;DR
OCB addresses the limitations of holistic image encodings in concept bottleneck models by introducing object-centric concept bottlenecks that fuse object proposals with concept discovery and a linear predictor. It extends CBMs to multi-label and logic-based single-label reasoning using the COCOLogic benchmark, demonstrating improved accuracy and interpretability over traditional CBMs. The work provides thorough ablative analyses of aggregation methods, object proposals, and the necessity of combining global and object-level features, offering practical guidance and a new challenging dataset for structured visual reasoning. Together, these contributions advance interpretable, object-aware visual reasoning with scalable, pretrained components.
Abstract
Developing high-performing, yet interpretable models remains a critical challenge in modern AI. Concept-based models (CBMs) attempt to address this by extracting human-understandable concepts from a global encoding (e.g., image encoding) and then applying a linear classifier on the resulting concept activations, enabling transparent decision-making. However, their reliance on holistic image encodings limits their expressiveness in object-centric real-world settings and thus hinders their ability to solve complex vision tasks beyond single-label classification. To tackle these challenges, we introduce Object-Centric Concept Bottlenecks (OCB), a framework that combines the strengths of CBMs and pre-trained object-centric foundation models, boosting performance and interpretability. We evaluate OCB on complex image datasets and conduct a comprehensive ablation study to analyze key components of the framework, such as strategies for aggregating object-concept encodings. The results show that OCB outperforms traditional CBMs and allows one to make interpretable decisions for complex visual tasks.
