Table of Contents
Fetching ...

Exploring the LLM Journey from Cognition to Expression with Linear Representations

Yuzi Yan, Jialian Li, Yipin Zhang, Dong Yan

TL;DR

The findings unveil a sequential development pattern, where cognitive abilities are largely established during Pretraining, whereas expressive abilities predominantly advance during SFT and RLHF, suggesting that cognitive capacity may limit expressive potential.

Abstract

This paper presents an in-depth examination of the evolution and interplay of cognitive and expressive capabilities in large language models (LLMs), with a specific focus on Baichuan-7B and Baichuan-33B, an advanced bilingual (Chinese and English) LLM series. We define and explore the model's cognitive and expressive capabilities through linear representations across three critical phases: Pretraining, Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF). Cognitive capability is defined as the quantity and quality of information conveyed by the neuron output vectors within the network, similar to the neural signal processing in human cognition. Expressive capability is defined as the model's capability to produce word-level output. Our findings unveil a sequential development pattern, where cognitive abilities are largely established during Pretraining, whereas expressive abilities predominantly advance during SFT and RLHF. Statistical analyses confirm a significant correlation between the two capabilities, suggesting that cognitive capacity may limit expressive potential. The paper also explores the theoretical underpinnings of these divergent developmental trajectories and their connection to the LLMs' architectural design. Moreover, we evaluate various optimization-independent strategies, such as few-shot learning and repeated sampling, which bridge the gap between cognitive and expressive capabilities. This research reveals the potential connection between the hidden space and the output space, contributing valuable insights into the interpretability and controllability of their training processes.

Exploring the LLM Journey from Cognition to Expression with Linear Representations

TL;DR

The findings unveil a sequential development pattern, where cognitive abilities are largely established during Pretraining, whereas expressive abilities predominantly advance during SFT and RLHF, suggesting that cognitive capacity may limit expressive potential.

Abstract

This paper presents an in-depth examination of the evolution and interplay of cognitive and expressive capabilities in large language models (LLMs), with a specific focus on Baichuan-7B and Baichuan-33B, an advanced bilingual (Chinese and English) LLM series. We define and explore the model's cognitive and expressive capabilities through linear representations across three critical phases: Pretraining, Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF). Cognitive capability is defined as the quantity and quality of information conveyed by the neuron output vectors within the network, similar to the neural signal processing in human cognition. Expressive capability is defined as the model's capability to produce word-level output. Our findings unveil a sequential development pattern, where cognitive abilities are largely established during Pretraining, whereas expressive abilities predominantly advance during SFT and RLHF. Statistical analyses confirm a significant correlation between the two capabilities, suggesting that cognitive capacity may limit expressive potential. The paper also explores the theoretical underpinnings of these divergent developmental trajectories and their connection to the LLMs' architectural design. Moreover, we evaluate various optimization-independent strategies, such as few-shot learning and repeated sampling, which bridge the gap between cognitive and expressive capabilities. This research reveals the potential connection between the hidden space and the output space, contributing valuable insights into the interpretability and controllability of their training processes.
Paper Structure (32 sections, 4 theorems, 14 equations, 15 figures, 8 tables, 1 algorithm)

This paper contains 32 sections, 4 theorems, 14 equations, 15 figures, 8 tables, 1 algorithm.

Key Result

Theorem 4.1

The gap between cognitive and expressive capabilities stems from the superior mapping efficiency of the function $f(\cdot)$ compared to $g(\cdot)$, along with the greater linear separability afforded by the hidden space $\mathcal{R}^m$ over the token-level space $\mathcal{T}$.

Figures (15)

  • Figure 1: Schematic representation of the asynchronous capabilities development process in LLMs. Initially, the model lacks the ability to comprehend questions or generate relevant responses. Through the Pretraining phase, the LLM primarily acquires cognitive capabilities, though its ability to articulate responses remains underdeveloped. Subsequent SFT and RLHF enhance the model's expressive capability, aligning it closely with the cognitive skills.
  • Figure 2: Progression of cognitive capability during the Pretraining stage in Baichuan-33B, as quantified by linear representations. The graph illustrates a stabilization in cognitive performance when the volume of training data reaches approximately 2.4T.
  • Figure 3: Illustration of the diminishing gap between expressive and cognitive capabilities in SFT and RLHF. Each SFT epoch processes 1 million tokens. The dotted line signifies the cognitive capability, established during the Pretraining phase and acting as the upper boundary for expressive capability. The solid line represents the expressive capability. The diagram highlights the gradual reduction of the disparity between these capabilities as the model undergoes further refinement through SFT and RLHF.
  • Figure 4: Convergence of cognitive capabilities by assessing consistency across consecutive checkpoints. The y-axis quantifies the discrepancy in judgments between each model and its predecessor. The red solid line is the average result.
  • Figure 5: UMAP visualization of HalluQA classifier in Baichuan-33B. The red line represents the delineation by SVM on neuron output while the green line represents that of token-level output. The blue dots and orange dots represents positive samples and negative samples in the datasets respectively. SFT and RLHF demonstrates the potential to align the expressive capability to the cognition capability in the fine-tuning stages.
  • ...and 10 more figures

Theorems & Definitions (8)

  • Remark 3.1
  • Theorem 4.1
  • Remark 4.2
  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Claim 3.4
  • Claim 3.5