Scaling up Discovery of Latent Concepts in Deep NLP Models
Majd Hawasly, Fahim Dalvi, Nadir Durrani
TL;DR
This paper tackles the interpretability bottleneck in deep NLP by scaling latent concept discovery through a comparative study of clustering methods applied to layer-wise contextualized representations. It introduces a two-dimensional quality metric, alignment and coverage, to evaluate how well discovered concepts align with and cover human-defined linguistic ontologies, and finds that K-Means offers substantially better scalability with comparable concept quality to Agglomerative clustering. Scaling experiments on BERT, RoBERTa, XLM-RoBERTa and Llama-2 demonstrate that larger datasets improve concept discovery, enabling phrasal-level and LLM-oriented analyses that were previously impractical. The results suggest that K-Means is a practical, scalable tool for latent concept discovery in large models, with demonstrated utility in phrasal interpretability and exploratory work on LLMs, while outlining avenues for future improvement and broader validation.
Abstract
Despite the revolution caused by deep NLP models, they remain black boxes, necessitating research to understand their decision-making processes. A recent work by Dalvi et al. (2022) carried out representation analysis through the lens of clustering latent spaces within pre-trained models (PLMs), but that approach is limited to small scale due to the high cost of running Agglomerative hierarchical clustering. This paper studies clustering algorithms in order to scale the discovery of encoded concepts in PLM representations to larger datasets and models. We propose metrics for assessing the quality of discovered latent concepts and use them to compare the studied clustering algorithms. We found that K-Means-based concept discovery significantly enhances efficiency while maintaining the quality of the obtained concepts. Furthermore, we demonstrate the practicality of this newfound efficiency by scaling latent concept discovery to LLMs and phrasal concepts.
