Crafting Interpretable Embeddings by Asking LLMs Questions

Vinamra Benara; Chandan Singh; John X. Morris; Richard Antonello; Ion Stoica; Alexander G. Huth; Jianfeng Gao

Crafting Interpretable Embeddings by Asking LLMs Questions

Vinamra Benara, Chandan Singh, John X. Morris, Richard Antonello, Ion Stoica, Alexander G. Huth, Jianfeng Gao

TL;DR

QA-Emb introduces an interpretable embedding by querying a pre-trained LLM with a set of yes/no questions, producing a binary feature vector that researchers can inspect. When applied to fMRI encoding of natural-language stimuli, QA-Emb achieves a 26% improvement over a traditional interpretable baseline and competes with black-box models, with strong performance achieved using as few as 29 questions. The approach is extendable to NLP tasks, demonstrated via information retrieval and zero-shot clustering, and can be accelerated through distillation into a single forward pass with many heads. These results suggest a practical pathway to interpretable, domain-grounded embeddings with broad potential in neuroscience and NLP, alongside important considerations around faithfulness and computational cost.

Abstract

Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks. However, their opaqueness and proliferation into scientific domains such as neuroscience have created a growing need for interpretability. Here, we ask whether we can obtain interpretable embeddings through LLM prompting. We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM. Training QA-Emb reduces to selecting a set of underlying questions rather than learning model weights. We use QA-Emb to flexibly generate interpretable models for predicting fMRI voxel responses to language stimuli. QA-Emb significantly outperforms an established interpretable baseline, and does so while requiring very few questions. This paves the way towards building flexible feature spaces that can concretize and evaluate our understanding of semantic brain representations. We additionally find that QA-Emb can be effectively approximated with an efficient model, and we explore broader applications in simple NLP tasks.

Crafting Interpretable Embeddings by Asking LLMs Questions

TL;DR

Abstract

Paper Structure (41 sections, 1 equation, 5 figures, 7 tables)

This paper contains 41 sections, 1 equation, 5 figures, 7 tables.

Introduction
Methods
Learning a set of yes/no questions
Post-hoc pruning of $Q$.
Limitations: computational cost and LLM inaccuracies.
Hyperparameter settings
Related work
Text embeddings
Interpreting representations
Natural language representations in fMRI
Main results: fMRI interpretation
fMRI experimental setup
Dataset
Regression modeling
Generating QA-Emb questions
...and 26 more sections

Figures (5)

Figure 1: QA-Emb produces an embedding for an input text by prompting an LLM with a series of yes/no questions. This embedding can then be used in downstream tasks such as fMRI response prediction or information retrieval.
Figure 2: Predictive performance for QA-Emb compared to baselines. (A) Test correlation for QA-Emb outperforms the interpretable Eng1000 baseline, is on par with the black-box BERT baseline, and is worse than the best-performing LLaMA model. (B) Test correlation for method quickly grows as a function of the number of included questions. (C) Test correlation per voxel for QA-Emb. (D) Difference in the test correlation per voxel for subject between QA-Emb and BERT. Error bars for (A) and (B) (standard error of the mean) are within the points (all are below 0.001). (B), (C), and (D) show results for subject S03.
Figure 3: Learned feature weights for 3 example questions capture known selectivity and are consistent across subjects. All feature weights are jointly rescaled to the range (-1, 1) for visualization. Abbreviations: Pr = precuneus, pTemp = posterior temporal cortex, PFC = prefrontal cortex, IPS = intraparietal sulcus, RSC = retrosplenial complex, OPA = occipital place area, PPA = parahippocampal place area, Broca = Broca's area, sPMv = superior premotor ventral speech area, AC = auditory cortex.
Figure 4: Performance of question-answering for underlying LLMs on the D3 collection of binary classification datasets. Each point shows an individual dataset and error bars show the 95% confidence interval.
Figure A1: Predictive performance for QA-Emb (top row) and the difference between QA-Emb and BERT (bottom row).

Crafting Interpretable Embeddings by Asking LLMs Questions

TL;DR

Abstract

Crafting Interpretable Embeddings by Asking LLMs Questions

Authors

TL;DR

Abstract

Table of Contents

Figures (5)