Table of Contents
Fetching ...

A Self-explaining Neural Architecture for Generalizable Concept Learning

Sanchit Sinha, Guangzhi Xiong, Aidong Zhang

TL;DR

This work tackles the problem of explainability in deep concept learning by addressing two core deficiencies: concept fidelity across similar classes and concept interoperability across domains. It introduces a self-explaining framework, Representative Concept Extraction (RCE), augmented with Self-Supervised Contrastive Learning (CCL) for domain invariance and Prototype-based Concept Grounding (PCG) to align concepts across domains. Through end-to-end training with a composite loss that combines reconstruction, classification, and contrastive regularization, the approach achieves superior domain adaptation and higher concept fidelity across four real-world datasets, with qualitative demonstrations of domain-aligned prototypes. The findings suggest that learning domain-invariant, human-interpretable concepts can be both accurate and transferable, offering practical benefits for trustworthy AI in cross-domain settings.

Abstract

With the wide proliferation of Deep Neural Networks in high-stake applications, there is a growing demand for explainability behind their decision-making process. Concept learning models attempt to learn high-level 'concepts' - abstract entities that align with human understanding, and thus provide interpretability to DNN architectures. However, in this paper, we demonstrate that present SOTA concept learning approaches suffer from two major problems - lack of concept fidelity wherein the models fail to learn consistent concepts among similar classes and limited concept interoperability wherein the models fail to generalize learned concepts to new domains for the same task. Keeping these in mind, we propose a novel self-explaining architecture for concept learning across domains which - i) incorporates a new concept saliency network for representative concept selection, ii) utilizes contrastive learning to capture representative domain invariant concepts, and iii) uses a novel prototype-based concept grounding regularization to improve concept alignment across domains. We demonstrate the efficacy of our proposed approach over current SOTA concept learning approaches on four widely used real-world datasets. Empirical results show that our method improves both concept fidelity measured through concept overlap and concept interoperability measured through domain adaptation performance.

A Self-explaining Neural Architecture for Generalizable Concept Learning

TL;DR

This work tackles the problem of explainability in deep concept learning by addressing two core deficiencies: concept fidelity across similar classes and concept interoperability across domains. It introduces a self-explaining framework, Representative Concept Extraction (RCE), augmented with Self-Supervised Contrastive Learning (CCL) for domain invariance and Prototype-based Concept Grounding (PCG) to align concepts across domains. Through end-to-end training with a composite loss that combines reconstruction, classification, and contrastive regularization, the approach achieves superior domain adaptation and higher concept fidelity across four real-world datasets, with qualitative demonstrations of domain-aligned prototypes. The findings suggest that learning domain-invariant, human-interpretable concepts can be both accurate and transferable, offering practical benefits for trustworthy AI in cross-domain settings.

Abstract

With the wide proliferation of Deep Neural Networks in high-stake applications, there is a growing demand for explainability behind their decision-making process. Concept learning models attempt to learn high-level 'concepts' - abstract entities that align with human understanding, and thus provide interpretability to DNN architectures. However, in this paper, we demonstrate that present SOTA concept learning approaches suffer from two major problems - lack of concept fidelity wherein the models fail to learn consistent concepts among similar classes and limited concept interoperability wherein the models fail to generalize learned concepts to new domains for the same task. Keeping these in mind, we propose a novel self-explaining architecture for concept learning across domains which - i) incorporates a new concept saliency network for representative concept selection, ii) utilizes contrastive learning to capture representative domain invariant concepts, and iii) uses a novel prototype-based concept grounding regularization to improve concept alignment across domains. We demonstrate the efficacy of our proposed approach over current SOTA concept learning approaches on four widely used real-world datasets. Empirical results show that our method improves both concept fidelity measured through concept overlap and concept interoperability measured through domain adaptation performance.
Paper Structure (27 sections, 9 equations, 11 figures, 7 tables, 1 algorithm)

This paper contains 27 sections, 9 equations, 11 figures, 7 tables, 1 algorithm.

Figures (11)

  • Figure 1: The proposed Representative Concept Extraction (RCE) framework. The networks $\mathbf{F}$ and $\mathbf{H}$ respectively extract concepts and associated relevance scores and $\mathbf{A}$ aggregates them. Network $\mathbf{G}$ reconstructs original input from the concepts while $\mathbf{T}$ selects the most representative concepts to the prediction.
  • Figure 2: Self-supervised contrastive concept learning. Images sampled from a set of positive $X^+$ and negative samples $X^-$ associated with an anchor image $x$. Green arrows depict direction of maximizing similarity, red arrows depict direction of minimizing similarity.
  • Figure 3: Prototype-based concept grounding (PCG). Concept grounding ensures the concept representations learned from both source and target domains are grounded to a representative concept representation prototype (Green).
  • Figure 4: Schematic overview of proposed SimCLR transformations for OfficeHome dataset from the Product(P) domain. Note that green arrows depict maximizing similarity while red arrows depict minimizing similarity in concept space. Transformation sets $T_1+$ and $T_2+$ comprise images transformed from chair while $T_1-$ and $T_2-$ consist of images transformed from non-chair classes.
  • Figure 5: Top-5 most important prototypes associated with randomly chosen concepts on a model trained using our methodology on the VisDA [TOP] and OfficeHome [BOTTOM] dataset for the 3D $\rightarrow$ Real and Art (A) $\rightarrow$ Real (R) domains respectively. The prototypes on the left are chosen from the training set of the source domain and the ones on the right are chosen from the target domain. As can be seen, in the VisDA dataset Concept #6 captures samples with wings - namely airplanes and oddly shaped cars while in OfficeHome, Concept #44 captures training samples with rounded faces in both domains - including alarm clocks, rotary telephones, etc. Similarly, Concept #29 captures flat screens - TVs, and monitors.
  • ...and 6 more figures