Crafting Interpretable Embeddings by Asking LLMs Questions
Vinamra Benara, Chandan Singh, John X. Morris, Richard Antonello, Ion Stoica, Alexander G. Huth, Jianfeng Gao
TL;DR
QA-Emb introduces an interpretable embedding by querying a pre-trained LLM with a set of yes/no questions, producing a binary feature vector that researchers can inspect. When applied to fMRI encoding of natural-language stimuli, QA-Emb achieves a 26% improvement over a traditional interpretable baseline and competes with black-box models, with strong performance achieved using as few as 29 questions. The approach is extendable to NLP tasks, demonstrated via information retrieval and zero-shot clustering, and can be accelerated through distillation into a single forward pass with many heads. These results suggest a practical pathway to interpretable, domain-grounded embeddings with broad potential in neuroscience and NLP, alongside important considerations around faithfulness and computational cost.
Abstract
Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks. However, their opaqueness and proliferation into scientific domains such as neuroscience have created a growing need for interpretability. Here, we ask whether we can obtain interpretable embeddings through LLM prompting. We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM. Training QA-Emb reduces to selecting a set of underlying questions rather than learning model weights. We use QA-Emb to flexibly generate interpretable models for predicting fMRI voxel responses to language stimuli. QA-Emb significantly outperforms an established interpretable baseline, and does so while requiring very few questions. This paves the way towards building flexible feature spaces that can concretize and evaluate our understanding of semantic brain representations. We additionally find that QA-Emb can be effectively approximated with an efficient model, and we explore broader applications in simple NLP tasks.
