Table of Contents
Fetching ...

Enhancing the Comprehensibility of Text Explanations via Unsupervised Concept Discovery

Yifan Sun, Danding Wang, Qiang Sheng, Juan Cao, Jintao Li

TL;DR

The paper tackles the lack of human-aligned interpretability in text explanations by introducing ECO-Concept, an intrinsically interpretable framework that automatically discovers comprehensible concepts without concept annotations using a slot-attention-based extractor. It integrates LLM-based comprehensibility evaluation as a feedback signal to refine concept representations, balancing task discriminativity with human interpretability. Empirical results across seven datasets show ECO-Concept achieves competitive or superior performance relative to supervised and unsupervised baselines while delivering more comprehensible concepts, as supported by both quantitative metrics and human studies. This approach offers a practical path toward trustworthy, explanation-rich NLP models without the need for extensive concept annotations.

Abstract

Concept-based explainable approaches have emerged as a promising method in explainable AI because they can interpret models in a way that aligns with human reasoning. However, their adaption in the text domain remains limited. Most existing methods rely on predefined concept annotations and cannot discover unseen concepts, while other methods that extract concepts without supervision often produce explanations that are not intuitively comprehensible to humans, potentially diminishing user trust. These methods fall short of discovering comprehensible concepts automatically. To address this issue, we propose \textbf{ECO-Concept}, an intrinsically interpretable framework to discover comprehensible concepts with no concept annotations. ECO-Concept first utilizes an object-centric architecture to extract semantic concepts automatically. Then the comprehensibility of the extracted concepts is evaluated by large language models. Finally, the evaluation result guides the subsequent model fine-tuning to obtain more understandable explanations. Experiments show that our method achieves superior performance across diverse tasks. Further concept evaluations validate that the concepts learned by ECO-Concept surpassed current counterparts in comprehensibility.

Enhancing the Comprehensibility of Text Explanations via Unsupervised Concept Discovery

TL;DR

The paper tackles the lack of human-aligned interpretability in text explanations by introducing ECO-Concept, an intrinsically interpretable framework that automatically discovers comprehensible concepts without concept annotations using a slot-attention-based extractor. It integrates LLM-based comprehensibility evaluation as a feedback signal to refine concept representations, balancing task discriminativity with human interpretability. Empirical results across seven datasets show ECO-Concept achieves competitive or superior performance relative to supervised and unsupervised baselines while delivering more comprehensible concepts, as supported by both quantitative metrics and human studies. This approach offers a practical path toward trustworthy, explanation-rich NLP models without the need for extensive concept annotations.

Abstract

Concept-based explainable approaches have emerged as a promising method in explainable AI because they can interpret models in a way that aligns with human reasoning. However, their adaption in the text domain remains limited. Most existing methods rely on predefined concept annotations and cannot discover unseen concepts, while other methods that extract concepts without supervision often produce explanations that are not intuitively comprehensible to humans, potentially diminishing user trust. These methods fall short of discovering comprehensible concepts automatically. To address this issue, we propose \textbf{ECO-Concept}, an intrinsically interpretable framework to discover comprehensible concepts with no concept annotations. ECO-Concept first utilizes an object-centric architecture to extract semantic concepts automatically. Then the comprehensibility of the extracted concepts is evaluated by large language models. Finally, the evaluation result guides the subsequent model fine-tuning to obtain more understandable explanations. Experiments show that our method achieves superior performance across diverse tasks. Further concept evaluations validate that the concepts learned by ECO-Concept surpassed current counterparts in comprehensibility.

Paper Structure

This paper contains 31 sections, 10 equations, 9 figures, 13 tables.

Figures (9)

  • Figure 1: Comparison of explanations between our proposed ➂ ECO-Concept and existing typical ➀ supervised and ➁ unsupervised concept-based methods. Supervised methods explain based on predefined concepts, while ECO-Concept and unsupervised methods explain via concept-related highlighted text. ECO-Concept eliminates the need for concept annotations and can discover unseen concepts with improved comprehensibility.
  • Figure 2: (a) Illustration of the proposed framework ECO-Concept. ECO-Concept consists of a concept extractor, a classifier, and a concept evaluator. (b) The concept extractor takes the encoded text $\bm{X}$ as input and interacts with the concept prototypes $\bm{C}$ to obtain a slot attention matrix $\bm{A}$ and concept features $\bm{U}$. The concept prototypes are optimized using consistency and distinctiveness loss. (c) The concept evaluator utilizes exemplars with the highest concept attention values to construct two sets, $\mathcal{D}_{sum}$ and $\mathcal{D}_{high}$. Using the corresponding slot attention matrices $\bm{A}_{sum}$ and $\bm{A}_{high}$, the evaluator highlights these exemplars to perform concept summarization and highlighting, thus getting the comprehensibility loss to guide model fine-tuning.
  • Figure 3: Human subjective ratings on concept quality
  • Figure 4: Classification performance and concept metrics comparison of the model before concept enhancement (the Base model), ECO-Concept, and ECO-Concept (w/o $\mathcal{L}_{com}$).
  • Figure 5: Human ratings of provided explanations (Und. indicating understandability, Plaus. indicating plausibility, and Help. indicating helpfulness)
  • ...and 4 more figures