Characterizing the Role of Similarity in the Property Inferences of Language Models
Juan Diego Rodriguez, Aaron Mueller, Kanishka Misra
TL;DR
The paper investigates whether taxonomy or similarity drives property inheritance in language models, and whether these factors interact rather than act independently. It combines behavioral experiments with nonce properties and two similarity metrics (Word-Sense and SPoSE) with a causal interpretability approach (Distributed Alignment Search) across four instruction-tuned LMs. The findings show that taxonomy and similarity jointly influence property projection, with similarity signals encoded in causal subspaces that are entangled with taxonomic information; SPoSE similarity aligns more with LM judgments than Word-Sense similarity. This challenges purely taxonomic views of LM reasoning, highlights human-like content effects in conceptual representations, and suggests new directions for psycholinguistic experiments and interpretability methods to probe inductive generalization in neural networks.
Abstract
Property inheritance -- a phenomenon where novel properties are projected from higher level categories (e.g., birds) to lower level ones (e.g., sparrows) -- provides a unique window into how humans organize and deploy conceptual knowledge. It is debated whether this ability arises due to explicitly stored taxonomic knowledge vs. simple computations of similarity between mental representations. How are these mechanistic hypotheses manifested in contemporary language models? In this work, we investigate how LMs perform property inheritance with behavioral and causal representational analysis experiments. We find that taxonomy and categorical similarities are not mutually exclusive in LMs' property inheritance behavior. That is, LMs are more likely to project novel properties from one category to the other when they are taxonomically related and at the same time, highly similar. Our findings provide insight into the conceptual structure of language models and may suggest new psycholinguistic experiments for human subjects.
