Linear Correlation in LM's Compositional Generalization and Hallucination
Letian Peng, Chenyang An, Shibo Hao, Chengyu Dong, Jingbo Shang
TL;DR
This work reveals a consistent linear relationship between Next Token Prediction logits for related knowledge prompts, such that $LogP_{Country,X} \approx W \cdot LogP_{City,X} + b$ across inputs. It demonstrates that this linear correlation persists through substantial fine-tuning and post-training, enabling knowledge transfer (compositional generalization) but also causing hallucinations when $W$ is imprecise. The authors show that the transformation can be learned with a simple feedforward setup over vocabulary representations, implying that lexical encodings play a key role in LM generalization. The findings provide a new diagnostic lens for LM knowledge composition and highlight a trade-off between reliable generalization and potential hallucination, with practical implications for targeted knowledge editing and multi-language reasoning.
Abstract
The generalization of language models (LMs) is undergoing active debates, contrasting their potential for general intelligence with their struggles with basic knowledge composition (e.g., reverse/transition curse). This paper uncovers the phenomenon of linear correlations in LMs during knowledge composition. For explanation, there exists a linear transformation between certain related knowledge that maps the next token prediction logits from one prompt to another, e.g., "X lives in the city of" $\rightarrow$ "X lives in the country of" for every given X. This mirrors the linearity in human knowledge composition, such as Paris $\rightarrow$ France. Our findings indicate that the linear transformation is resilient to large-scale fine-tuning, generalizing updated knowledge when aligned with real-world relationships, but causing hallucinations when it deviates. Empirical results suggest that linear correlation can serve as a potential identifier of LM's generalization. Finally, we show such linear correlations can be learned with a single feedforward network and pre-trained vocabulary representations, indicating LM generalization heavily relies on the latter.
