Monotonic Representation of Numeric Properties in Language Models
Benjamin Heinzerling, Kentaro Inui
TL;DR
This work investigates how language models encode numeric properties such as birth years by identifying low-dimensional, monotonic subspaces that correlate with expressed quantities. It uses partial least squares regression to find property-encoding directions from entity prompts and activation representations, and then tests causality by activation patching along these directions, observing monotonic changes in outputs with notable side effects. Across multiple models and six numeric properties, the authors show that most numeric attributes are predictable from 2–6 dimensional subspaces, and that perturbations along these directions causally shift the model outputs in a monotonic fashion. The findings suggest that monotonic representations of numeric properties emerge during pretraining and provide a framework for interpretable and controllable interventions in LM behavior, with implications for interpretability and alignment.
Abstract
Language models (LMs) can express factual knowledge involving numeric properties such as Karl Popper was born in 1902. However, how this information is encoded in the model's internal representations is not understood well. Here, we introduce a simple method for finding and editing representations of numeric properties such as an entity's birth year. Empirically, we find low-dimensional subspaces that encode numeric properties monotonically, in an interpretable and editable fashion. When editing representations along directions in these subspaces, LM output changes accordingly. For example, by patching activations along a "birthyear" direction we can make the LM express an increasingly late birthyear: Karl Popper was born in 1929, Karl Popper was born in 1957, Karl Popper was born in 1968. Property-encoding directions exist across several numeric properties in all models under consideration, suggesting the possibility that monotonic representation of numeric properties consistently emerges during LM pretraining. Code: https://github.com/bheinzerling/numeric-property-repr
