Table of Contents
Fetching ...

Scaling up Discovery of Latent Concepts in Deep NLP Models

Majd Hawasly, Fahim Dalvi, Nadir Durrani

TL;DR

This paper tackles the interpretability bottleneck in deep NLP by scaling latent concept discovery through a comparative study of clustering methods applied to layer-wise contextualized representations. It introduces a two-dimensional quality metric, alignment and coverage, to evaluate how well discovered concepts align with and cover human-defined linguistic ontologies, and finds that K-Means offers substantially better scalability with comparable concept quality to Agglomerative clustering. Scaling experiments on BERT, RoBERTa, XLM-RoBERTa and Llama-2 demonstrate that larger datasets improve concept discovery, enabling phrasal-level and LLM-oriented analyses that were previously impractical. The results suggest that K-Means is a practical, scalable tool for latent concept discovery in large models, with demonstrated utility in phrasal interpretability and exploratory work on LLMs, while outlining avenues for future improvement and broader validation.

Abstract

Despite the revolution caused by deep NLP models, they remain black boxes, necessitating research to understand their decision-making processes. A recent work by Dalvi et al. (2022) carried out representation analysis through the lens of clustering latent spaces within pre-trained models (PLMs), but that approach is limited to small scale due to the high cost of running Agglomerative hierarchical clustering. This paper studies clustering algorithms in order to scale the discovery of encoded concepts in PLM representations to larger datasets and models. We propose metrics for assessing the quality of discovered latent concepts and use them to compare the studied clustering algorithms. We found that K-Means-based concept discovery significantly enhances efficiency while maintaining the quality of the obtained concepts. Furthermore, we demonstrate the practicality of this newfound efficiency by scaling latent concept discovery to LLMs and phrasal concepts.

Scaling up Discovery of Latent Concepts in Deep NLP Models

TL;DR

This paper tackles the interpretability bottleneck in deep NLP by scaling latent concept discovery through a comparative study of clustering methods applied to layer-wise contextualized representations. It introduces a two-dimensional quality metric, alignment and coverage, to evaluate how well discovered concepts align with and cover human-defined linguistic ontologies, and finds that K-Means offers substantially better scalability with comparable concept quality to Agglomerative clustering. Scaling experiments on BERT, RoBERTa, XLM-RoBERTa and Llama-2 demonstrate that larger datasets improve concept discovery, enabling phrasal-level and LLM-oriented analyses that were previously impractical. The results suggest that K-Means is a practical, scalable tool for latent concept discovery in large models, with demonstrated utility in phrasal interpretability and exploratory work on LLMs, while outlining avenues for future improvement and broader validation.

Abstract

Despite the revolution caused by deep NLP models, they remain black boxes, necessitating research to understand their decision-making processes. A recent work by Dalvi et al. (2022) carried out representation analysis through the lens of clustering latent spaces within pre-trained models (PLMs), but that approach is limited to small scale due to the high cost of running Agglomerative hierarchical clustering. This paper studies clustering algorithms in order to scale the discovery of encoded concepts in PLM representations to larger datasets and models. We propose metrics for assessing the quality of discovered latent concepts and use them to compare the studied clustering algorithms. We found that K-Means-based concept discovery significantly enhances efficiency while maintaining the quality of the obtained concepts. Furthermore, we demonstrate the practicality of this newfound efficiency by scaling latent concept discovery to LLMs and phrasal concepts.
Paper Structure (25 sections, 1 equation, 9 figures, 8 tables)

This paper contains 25 sections, 1 equation, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Discovery of encoded concepts within a PLM using clustering of contextualized embeddings, and evaluation of discovered concepts through alignment and coverage metrics with respect to human ontologies.
  • Figure 2: Examples of encoded concepts in BERT aligned with human-defined ontologies
  • Figure 3: Alignment (percentage of discovered encoded concepts) of K-Means for POS (left) and CCG (right) in the base BERT model versus the corresponding fine-tuned models. The number of aligned concepts appreciate significantly in the higher layers of the tuned model in both cases.
  • Figure 4: Histogram of cluster sizes for Agglomerative hierarchical clustering and K-Means on the same data. $K$-Means shows a heavier distribution (median 319 words per cluster), while Agglomerative clustering gave more small clusters (median 275) and a longer tail.
  • Figure 5: Number of aligned concepts for selected POS tag across different models. More results across various models and clusterings can be found in Appendix \ref{['sec:appendix:comparingArchitectures']}.
  • ...and 4 more figures