Table of Contents
Fetching ...

Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study

Zhengyu Hu, Jianxun Lian, Zheyuan Xiao, Seraphina Zhang, Tianfu Wang, Nicholas Jing Yuan, Xing Xie, Hui Xiong

TL;DR

This work posits a cognitively grounded framework that decomposes LLM learning into Learning from Instructor, Learning from Concept, and Learning from Experience. Through targeted experiments across modular settings (Passive vs Interactive instruction, conceptual injections, and experiential adaptation), it shows that interaction enhances learning, conceptual understanding scales with model capacity, and many-shot generalization remains challenging due to long-context limits. The authors introduce LearnArena, a unified benchmark that assesses general learning ability across cognitive dimensions, revealing that architectural and training improvements, alongside scale, drive advancements in adaptive, human-like learning. The results highlight practical implications for designing adaptive AI agents and set a path for future research into integrated, cognitively inspired evaluation of LLM learning behaviors.

Abstract

Large language models (LLMs) have shown impressive capabilities across tasks such as mathematics, coding, and reasoning, yet their learning ability, which is crucial for adapting to dynamic environments and acquiring new knowledge, remains underexplored. In this work, we address this gap by introducing a framework inspired by cognitive psychology and education. Specifically, we decompose general learning ability into three distinct, complementary dimensions: Learning from Instructor (acquiring knowledge via explicit guidance), Learning from Concept (internalizing abstract structures and generalizing to new contexts), and Learning from Experience (adapting through accumulated exploration and feedback). We conduct a comprehensive empirical study across the three learning dimensions and identify several insightful findings, such as (i) interaction improves learning; (ii) conceptual understanding is scale-emergent and benefits larger models; and (iii) LLMs are effective few-shot learners but not many-shot learners. Based on our framework and empirical findings, we introduce a benchmark that provides a unified and realistic evaluation of LLMs' general learning abilities across three learning cognition dimensions. It enables diagnostic insights and supports evaluation and development of more adaptive and human-like models.

Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study

TL;DR

This work posits a cognitively grounded framework that decomposes LLM learning into Learning from Instructor, Learning from Concept, and Learning from Experience. Through targeted experiments across modular settings (Passive vs Interactive instruction, conceptual injections, and experiential adaptation), it shows that interaction enhances learning, conceptual understanding scales with model capacity, and many-shot generalization remains challenging due to long-context limits. The authors introduce LearnArena, a unified benchmark that assesses general learning ability across cognitive dimensions, revealing that architectural and training improvements, alongside scale, drive advancements in adaptive, human-like learning. The results highlight practical implications for designing adaptive AI agents and set a path for future research into integrated, cognitively inspired evaluation of LLM learning behaviors.

Abstract

Large language models (LLMs) have shown impressive capabilities across tasks such as mathematics, coding, and reasoning, yet their learning ability, which is crucial for adapting to dynamic environments and acquiring new knowledge, remains underexplored. In this work, we address this gap by introducing a framework inspired by cognitive psychology and education. Specifically, we decompose general learning ability into three distinct, complementary dimensions: Learning from Instructor (acquiring knowledge via explicit guidance), Learning from Concept (internalizing abstract structures and generalizing to new contexts), and Learning from Experience (adapting through accumulated exploration and feedback). We conduct a comprehensive empirical study across the three learning dimensions and identify several insightful findings, such as (i) interaction improves learning; (ii) conceptual understanding is scale-emergent and benefits larger models; and (iii) LLMs are effective few-shot learners but not many-shot learners. Based on our framework and empirical findings, we introduce a benchmark that provides a unified and realistic evaluation of LLMs' general learning abilities across three learning cognition dimensions. It enables diagnostic insights and supports evaluation and development of more adaptive and human-like models.

Paper Structure

This paper contains 94 sections, 1 equation, 11 figures, 21 tables.

Figures (11)

  • Figure 1: Overview of our proposed cognitive framework for evaluating general learning abilities in LLMs. We decompose learning into three core types: (a) Learning from Instructor; (b) Learning from Concept; and (c) Learning from Experience.
  • Figure 2: Comparison of learner performance under Passive Consumption and Interactive Clarification paradigms across eight mathematical benchmarks.
  • Figure 3: Scaling learner models (Qwen2.5 variants) with a fixed Qwen2.5-72B instructor. Larger learners learn more effectively, and interactive clarification further improves outcomes.
  • Figure 4: Scaling instructor models (Qwen2.5 variants) with a fixed Qwen2.5-7B learner. Weak instructors can degrade learning in the interactive setting; however, interaction becomes beneficial as instructor capability increases.
  • Figure 5: Win rates of Player-1 across six competitive environments from TextArena under the LfC setting.
  • ...and 6 more figures