How do Language Models Bind Entities in Context?
Jiahai Feng, Jacob Steinhardt
TL;DR
Language models need to bind entities to their contextual attributes; the authors propose Binding ID as a general internal mechanism and validate it with causal mediation analyses. They show bindings are implemented as additive binding functions that attach to binding ID vectors forming a continuous subspace, and that these IDs generalize across tasks and scale, enabling transfer and robust in-context reasoning. They also identify a non-universal direct-binding mechanism in MCQ tasks, highlighting limits of universality. The work advances interpretability of in-context reasoning and suggests scalable, transferrable symbolic representations emerge in large LMs.
Abstract
To correctly use in-context information, language models (LMs) must bind entities to their attributes. For example, given a context describing a "green square" and a "blue circle", LMs must bind the shapes to their respective colors. We analyze LM representations and identify the binding ID mechanism: a general mechanism for solving the binding problem, which we observe in every sufficiently large model from the Pythia and LLaMA families. Using causal interventions, we show that LMs' internal activations represent binding information by attaching binding ID vectors to corresponding entities and attributes. We further show that binding ID vectors form a continuous subspace, in which distances between binding ID vectors reflect their discernability. Overall, our results uncover interpretable strategies in LMs for representing symbolic knowledge in-context, providing a step towards understanding general in-context reasoning in large-scale LMs.
