Table of Contents
Fetching ...

Towards Achieving Concept Completeness for Textual Concept Bottleneck Models

Milan Bhan, Yann Choho, Pierre Moreau, Jean-Noel Vittaut, Nicolas Chesneau, Marie-Jeanne Lesot

TL;DR

The paper tackles interpretability in NLP by addressing the need for complete, reliable concept bases in textual concept bottleneck models. It introduces CT-CBM, a four-step framework that unsupervisedly constructs micro and macro concept banks using a small language model, scores concepts through concept activation vectors and identifiability measures, initializes a diverse and coverage-rich CBL, and trains simple and residual TCBMs with a stopping criterion that ensures concept completeness. CT-CBM achieves downstream performance on par with strong baselines while significantly reducing the number of concepts and greatly improving concept-detection accuracy, across both general and technical domains. The work demonstrates practical benefits including concept-level intervention, analysis of adversarial and counterfactual explanations, and global interpretability, showcasing a scalable, reproducible path to faithful NLP explanations. Overall, CT-CBM offers a principled, unsupervised route to complete, interpretable NLP classifiers with tangible impact on reliability and transparency.

Abstract

Textual Concept Bottleneck Models (TCBMs) are interpretable-by-design models for text classification that predict a set of salient concepts before making the final prediction. This paper proposes Complete Textual Concept Bottleneck Model (CT-CBM), a novel TCBM generator building concept labels in a fully unsupervised manner using a small language model, eliminating both the need for predefined human labeled concepts and LLM annotations. CT-CBM iteratively targets and adds important and identifiable concepts in the bottleneck layer to create a complete concept basis. CT-CBM achieves striking results against competitors in terms of concept basis completeness and concept detection accuracy, offering a promising solution to reliably enhance interpretability of NLP classifiers.

Towards Achieving Concept Completeness for Textual Concept Bottleneck Models

TL;DR

The paper tackles interpretability in NLP by addressing the need for complete, reliable concept bases in textual concept bottleneck models. It introduces CT-CBM, a four-step framework that unsupervisedly constructs micro and macro concept banks using a small language model, scores concepts through concept activation vectors and identifiability measures, initializes a diverse and coverage-rich CBL, and trains simple and residual TCBMs with a stopping criterion that ensures concept completeness. CT-CBM achieves downstream performance on par with strong baselines while significantly reducing the number of concepts and greatly improving concept-detection accuracy, across both general and technical domains. The work demonstrates practical benefits including concept-level intervention, analysis of adversarial and counterfactual explanations, and global interpretability, showcasing a scalable, reproducible path to faithful NLP explanations. Overall, CT-CBM offers a principled, unsupervised route to complete, interpretable NLP classifiers with tangible impact on reliability and transparency.

Abstract

Textual Concept Bottleneck Models (TCBMs) are interpretable-by-design models for text classification that predict a set of salient concepts before making the final prediction. This paper proposes Complete Textual Concept Bottleneck Model (CT-CBM), a novel TCBM generator building concept labels in a fully unsupervised manner using a small language model, eliminating both the need for predefined human labeled concepts and LLM annotations. CT-CBM iteratively targets and adds important and identifiable concepts in the bottleneck layer to create a complete concept basis. CT-CBM achieves striking results against competitors in terms of concept basis completeness and concept detection accuracy, offering a promising solution to reliably enhance interpretability of NLP classifiers.

Paper Structure

This paper contains 63 sections, 7 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: CT-CBM overview illustrated with film synopsis classification. CT-CBM is a 4-step approach to build a TCBM from a $f$ black box NLP classifier. (1) A concept bank is created from the text corpus of interest. (2) Concepts are scored given their importance to explain $f$ predictions and their identifiability score, and the CBL is initialized. (3) The TCBM is trained through 3 layers: $\Phi^{\text{C}}$, $\Phi^{\text{cls}}$ and $\Phi^{\text{r}}$. (4) The TCBM training stops when the performance of the TCBM with $\Phi^{\text{r}}$ is reached without the latter; $\Phi^{\text{r}}$ is finally removed.
  • Figure 2: Example of an adversarial attack (left, $x_{adv}$) and a counterfactual explanation (right, $x_{cf}$) obtained from CT-CBM on AGnews. TCBM enables to understand the label change in terms of concept change.
  • Figure 3: Cloud of micro concepts composing the macro concept "Postponements or interputions" from the AGnews dataset.
  • Figure 4: Cloud of micro concepts composing the macro concept "Instances of accountability or public discourse" from the AGnews dataset.
  • Figure 5: Cloud of micro concepts composing the macro concept "Acronyms and initials" from the AGnews dataset.
  • ...and 4 more figures