Concept Based Continuous Prompts for Interpretable Text Classification
Qian Chen, Dongyang Li, Xiaofeng He
TL;DR
This work tackles the interpretability gap of continuous prompts for text classification by introducing Concept Decomposition (CD), which factorizes the prompt matrix $\mathbf{P}$ into a concept embedding matrix $\mathbf{C}$ and a coefficient matrix $\mathbf{Q}$ so that $\mathbf{C}\mathbf{Q}$ approximates $\mathbf{P}$ with guaranteed feasibility $||\mathbf{C}\mathbf{Q}-\mathbf{P}||_F^2 \leq \epsilon$. It combines GPT-4o-based concept generation with a novel, monotone submodular objective that optimizes diversity and coverage to select a compact, discriminative concept set, followed by learning $\mathbf{C}$ and $\mathbf{Q}$ under a fidelity-performances loss $\mathcal{L} = \mu\mathcal{L}_f + \mathcal{L}_l$ to retain predictive power. The framework enables local, input-specific explanations by ranking concepts via a per-concept key derived from $\mathbf{Q}$, and is demonstrated on SST-2, IMDB, and AGNews using BERT-Large and GPT-2-Medium, achieving competitive accuracy with P-tuning and discrete-word baselines while producing more plausible concept-based explanations. While effective, CD exhibits concept-level noise at higher concept counts, suggesting future work on more robust concept curation and improved alignment between prompts and human-interpretable semantics.
Abstract
Continuous prompts have become widely adopted for augmenting performance across a wide range of natural language tasks. However, the underlying mechanism of this enhancement remains obscure. Previous studies rely on individual words for interpreting continuous prompts, which lacks comprehensive semantic understanding. Drawing inspiration from Concept Bottleneck Models, we propose a framework for interpreting continuous prompts by decomposing them into human-readable concepts. Specifically, to ensure the feasibility of the decomposition, we demonstrate that a corresponding concept embedding matrix and a coefficient matrix can always be found to replace the prompt embedding matrix. Then, we employ GPT-4o to generate a concept pool and choose potential candidate concepts that are discriminative and representative using a novel submodular optimization algorithm. Experiments demonstrate that our framework can achieve similar results as the original P-tuning and word-based approaches using only a few concepts while providing more plausible results. Our code is available at https://github.com/qq31415926/CD.
