Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study
Zhengyu Hu, Jianxun Lian, Zheyuan Xiao, Seraphina Zhang, Tianfu Wang, Nicholas Jing Yuan, Xing Xie, Hui Xiong
TL;DR
This work posits a cognitively grounded framework that decomposes LLM learning into Learning from Instructor, Learning from Concept, and Learning from Experience. Through targeted experiments across modular settings (Passive vs Interactive instruction, conceptual injections, and experiential adaptation), it shows that interaction enhances learning, conceptual understanding scales with model capacity, and many-shot generalization remains challenging due to long-context limits. The authors introduce LearnArena, a unified benchmark that assesses general learning ability across cognitive dimensions, revealing that architectural and training improvements, alongside scale, drive advancements in adaptive, human-like learning. The results highlight practical implications for designing adaptive AI agents and set a path for future research into integrated, cognitively inspired evaluation of LLM learning behaviors.
Abstract
Large language models (LLMs) have shown impressive capabilities across tasks such as mathematics, coding, and reasoning, yet their learning ability, which is crucial for adapting to dynamic environments and acquiring new knowledge, remains underexplored. In this work, we address this gap by introducing a framework inspired by cognitive psychology and education. Specifically, we decompose general learning ability into three distinct, complementary dimensions: Learning from Instructor (acquiring knowledge via explicit guidance), Learning from Concept (internalizing abstract structures and generalizing to new contexts), and Learning from Experience (adapting through accumulated exploration and feedback). We conduct a comprehensive empirical study across the three learning dimensions and identify several insightful findings, such as (i) interaction improves learning; (ii) conceptual understanding is scale-emergent and benefits larger models; and (iii) LLMs are effective few-shot learners but not many-shot learners. Based on our framework and empirical findings, we introduce a benchmark that provides a unified and realistic evaluation of LLMs' general learning abilities across three learning cognition dimensions. It enables diagnostic insights and supports evaluation and development of more adaptive and human-like models.
