Table of Contents
Fetching ...

Concept Based Continuous Prompts for Interpretable Text Classification

Qian Chen, Dongyang Li, Xiaofeng He

TL;DR

This work tackles the interpretability gap of continuous prompts for text classification by introducing Concept Decomposition (CD), which factorizes the prompt matrix $\mathbf{P}$ into a concept embedding matrix $\mathbf{C}$ and a coefficient matrix $\mathbf{Q}$ so that $\mathbf{C}\mathbf{Q}$ approximates $\mathbf{P}$ with guaranteed feasibility $||\mathbf{C}\mathbf{Q}-\mathbf{P}||_F^2 \leq \epsilon$. It combines GPT-4o-based concept generation with a novel, monotone submodular objective that optimizes diversity and coverage to select a compact, discriminative concept set, followed by learning $\mathbf{C}$ and $\mathbf{Q}$ under a fidelity-performances loss $\mathcal{L} = \mu\mathcal{L}_f + \mathcal{L}_l$ to retain predictive power. The framework enables local, input-specific explanations by ranking concepts via a per-concept key derived from $\mathbf{Q}$, and is demonstrated on SST-2, IMDB, and AGNews using BERT-Large and GPT-2-Medium, achieving competitive accuracy with P-tuning and discrete-word baselines while producing more plausible concept-based explanations. While effective, CD exhibits concept-level noise at higher concept counts, suggesting future work on more robust concept curation and improved alignment between prompts and human-interpretable semantics.

Abstract

Continuous prompts have become widely adopted for augmenting performance across a wide range of natural language tasks. However, the underlying mechanism of this enhancement remains obscure. Previous studies rely on individual words for interpreting continuous prompts, which lacks comprehensive semantic understanding. Drawing inspiration from Concept Bottleneck Models, we propose a framework for interpreting continuous prompts by decomposing them into human-readable concepts. Specifically, to ensure the feasibility of the decomposition, we demonstrate that a corresponding concept embedding matrix and a coefficient matrix can always be found to replace the prompt embedding matrix. Then, we employ GPT-4o to generate a concept pool and choose potential candidate concepts that are discriminative and representative using a novel submodular optimization algorithm. Experiments demonstrate that our framework can achieve similar results as the original P-tuning and word-based approaches using only a few concepts while providing more plausible results. Our code is available at https://github.com/qq31415926/CD.

Concept Based Continuous Prompts for Interpretable Text Classification

TL;DR

This work tackles the interpretability gap of continuous prompts for text classification by introducing Concept Decomposition (CD), which factorizes the prompt matrix into a concept embedding matrix and a coefficient matrix so that approximates with guaranteed feasibility . It combines GPT-4o-based concept generation with a novel, monotone submodular objective that optimizes diversity and coverage to select a compact, discriminative concept set, followed by learning and under a fidelity-performances loss to retain predictive power. The framework enables local, input-specific explanations by ranking concepts via a per-concept key derived from , and is demonstrated on SST-2, IMDB, and AGNews using BERT-Large and GPT-2-Medium, achieving competitive accuracy with P-tuning and discrete-word baselines while producing more plausible concept-based explanations. While effective, CD exhibits concept-level noise at higher concept counts, suggesting future work on more robust concept curation and improved alignment between prompts and human-interpretable semantics.

Abstract

Continuous prompts have become widely adopted for augmenting performance across a wide range of natural language tasks. However, the underlying mechanism of this enhancement remains obscure. Previous studies rely on individual words for interpreting continuous prompts, which lacks comprehensive semantic understanding. Drawing inspiration from Concept Bottleneck Models, we propose a framework for interpreting continuous prompts by decomposing them into human-readable concepts. Specifically, to ensure the feasibility of the decomposition, we demonstrate that a corresponding concept embedding matrix and a coefficient matrix can always be found to replace the prompt embedding matrix. Then, we employ GPT-4o to generate a concept pool and choose potential candidate concepts that are discriminative and representative using a novel submodular optimization algorithm. Experiments demonstrate that our framework can achieve similar results as the original P-tuning and word-based approaches using only a few concepts while providing more plausible results. Our code is available at https://github.com/qq31415926/CD.

Paper Structure

This paper contains 23 sections, 15 equations, 4 figures, 19 tables.

Figures (4)

  • Figure 1: An explanation example from work3. From left to right three sub-figures represent the SST-2, IMDB, and AGNews dataset results. The color bar represents values.
  • Figure 2: Left: The bad case from work3, which the continuous prompt $\boldsymbol{p}$ fails to locate in the span space $\boldsymbol{V}$. Given the existence of $\boldsymbol{p}_{1}$, the fitting gap cannot be eliminated. Right: The accuracy results from work3. x represents a vocabulary size. As the capacity of vocabulary increases, the performance does not significantly change.
  • Figure 3: Overview of our framework CD. Our framework prompts GPT-4o to generate candidate concepts for each class (Section \ref{['step1']}). We then use submodular optimization to select ones that maximize diversity and coverage (Section \ref{['step2']}). Next, we initialize the concept embedding $\boldsymbol{C}$ by feeding the concepts into a text encoder (BERT-Large or GPT2-Medium) and initialize the coefficient embedding $\boldsymbol{Q}$. Finally, we tune the embeddings using stochastic gradient descent (Section \ref{['step3']}). $\tilde{\boldsymbol{X}}$ denotes the prompt-augmented input. $\tilde{\boldsymbol{P}}=\boldsymbol{CQ}$.
  • Figure 4: Comparison results before and after attack on SST-2, IMDB and AGNews datasets on BERT-Large.

Theorems & Definitions (1)

  • proof