Explaining Datasets in Words: Statistical Models with Natural Language Parameters
Ruiqi Zhong, Heng Wang, Dan Klein, Jacob Steinhardt
TL;DR
The paper introduces predicate-conditioned distributions $p(x|\vec{\phi},\mathbf{w}) \propto e^{\mathbf{w}^\top \llbracket \vec{\phi} \rrbracket(x)}$ where natural language predicates $\phi$ provide interpretable features via denotation $\llbracket \phi \rrbracket(x) \in {0,1}$. It then develops a model-agnostic learning pipeline with a continuous relaxation $\tilde{\phi}$ and a discretization step via prompting language models, iterating to refine the predicates and weights. Three exemplar models—clustering, time series, and multiclass classification—are instantiated and evaluated on multiple text datasets, showing that relaxation and refinement improve performance and can match specialized explainable clustering methods. The framework also demonstrates broad open-ended applications in text and vision, enabling explanations for subareas, temporal dynamics, and cross-model comparisons, albeit with acknowledged computational costs and dependency on LLMs. Overall, this work provides a flexible, interpretable, language-grounded approach to analyzing complex datasets and extracting human-understandable patterns.
Abstract
To make sense of massive data, we often fit simplified models and then interpret the parameters; for example, we cluster the text embeddings and then interpret the mean parameters of each cluster. However, these parameters are often high-dimensional and hard to interpret. To make model parameters directly interpretable, we introduce a family of statistical models -- including clustering, time series, and classification models -- parameterized by natural language predicates. For example, a cluster of text about COVID could be parameterized by the predicate "discusses COVID". To learn these statistical models effectively, we develop a model-agnostic algorithm that optimizes continuous relaxations of predicate parameters with gradient descent and discretizes them by prompting language models (LMs). Finally, we apply our framework to a wide range of problems: taxonomizing user chat dialogues, characterizing how they evolve across time, finding categories where one language model is better than the other, clustering math problems based on subareas, and explaining visual features in memorable images. Our framework is highly versatile, applicable to both textual and visual domains, can be easily steered to focus on specific properties (e.g. subareas), and explains sophisticated concepts that classical methods (e.g. n-gram analysis) struggle to produce.
