Formality is Favored: Unraveling the Learning Preferences of Large Language Models on Data with Conflicting Knowledge
Jiahuan Li, Yiqing Cao, Shujian Huang, Jiajun Chen
TL;DR
This work investigates whether large language models exhibit human-like learning biases when trained on data with conflicting knowledge. Using a controlled pseudo-data setup with configurable features (e.g., formality, spelling accuracy), the authors fine-tune a LLaMA-7B model in a continual pre-training regime and evaluate knowledge integration via perplexity on prefixes that probe conflicting information. They find that models prefer data aligned with majority-consistency, with stronger effects in larger models and cross-language generalization, and demonstrate that such preferences can be instilled or erased by adjusting the degree of consistency across the training data. The findings have implications for data curation and model governance, suggesting that linguistic and stylistic features in fine-tuning data can steer knowledge retention in LLMs. Overall, the study highlights a human-like sensitivity to data quality and majority signals as a driver of learning in LLMs, offering practical avenues to shape model behavior through targeted data design.
Abstract
Having been trained on massive pretraining data, large language models have shown excellent performance on many knowledge-intensive tasks. However, pretraining data tends to contain misleading and even conflicting information, and it is intriguing to understand how LLMs handle these noisy data during training. In this study, we systematically analyze LLMs' learning preferences for data with conflicting knowledge. We find that pretrained LLMs establish learning preferences similar to humans, i.e., preferences towards formal texts and texts with fewer spelling errors, resulting in faster learning and more favorable treatment of knowledge in data with such features when facing conflicts. This finding is generalizable across models and languages and is more evident in larger models. An in-depth analysis reveals that LLMs tend to trust data with features that signify consistency with the majority of data, and it is possible to instill new preferences and erase old ones by manipulating the degree of consistency with the majority data.
