Exploring the LLM Journey from Cognition to Expression with Linear Representations

Yuzi Yan; Jialian Li; Yipin Zhang; Dong Yan

Exploring the LLM Journey from Cognition to Expression with Linear Representations

Yuzi Yan, Jialian Li, Yipin Zhang, Dong Yan

TL;DR

The findings unveil a sequential development pattern, where cognitive abilities are largely established during Pretraining, whereas expressive abilities predominantly advance during SFT and RLHF, suggesting that cognitive capacity may limit expressive potential.

Abstract

This paper presents an in-depth examination of the evolution and interplay of cognitive and expressive capabilities in large language models (LLMs), with a specific focus on Baichuan-7B and Baichuan-33B, an advanced bilingual (Chinese and English) LLM series. We define and explore the model's cognitive and expressive capabilities through linear representations across three critical phases: Pretraining, Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF). Cognitive capability is defined as the quantity and quality of information conveyed by the neuron output vectors within the network, similar to the neural signal processing in human cognition. Expressive capability is defined as the model's capability to produce word-level output. Our findings unveil a sequential development pattern, where cognitive abilities are largely established during Pretraining, whereas expressive abilities predominantly advance during SFT and RLHF. Statistical analyses confirm a significant correlation between the two capabilities, suggesting that cognitive capacity may limit expressive potential. The paper also explores the theoretical underpinnings of these divergent developmental trajectories and their connection to the LLMs' architectural design. Moreover, we evaluate various optimization-independent strategies, such as few-shot learning and repeated sampling, which bridge the gap between cognitive and expressive capabilities. This research reveals the potential connection between the hidden space and the output space, contributing valuable insights into the interpretability and controllability of their training processes.

Exploring the LLM Journey from Cognition to Expression with Linear Representations

TL;DR

Abstract

Paper Structure (32 sections, 4 theorems, 14 equations, 15 figures, 8 tables, 1 algorithm)

This paper contains 32 sections, 4 theorems, 14 equations, 15 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Main Results
Definitions and Quantification of Cognitive and Expressive Capabilities
Datasets and Experimental Setup
Pretraining: Building Cognitive Capability
SFT and RLHF: Aligning Expressive and Cognitive Capabilities
Statistical Correlation between Cognitive Capability and Expressive Capability
Assessment of Cognitive Convergence across Training Phases
Theoretical Analysis
Explanation of the Capability Gap
Establishment of Cognitive Capability
Establishment of Expressive Capability
Methods for Bridging the Gap
Few-shot Learning
...and 17 more sections

Key Result

Theorem 4.1

The gap between cognitive and expressive capabilities stems from the superior mapping efficiency of the function $f(\cdot)$ compared to $g(\cdot)$, along with the greater linear separability afforded by the hidden space $\mathcal{R}^m$ over the token-level space $\mathcal{T}$.

Figures (15)

Figure 1: Schematic representation of the asynchronous capabilities development process in LLMs. Initially, the model lacks the ability to comprehend questions or generate relevant responses. Through the Pretraining phase, the LLM primarily acquires cognitive capabilities, though its ability to articulate responses remains underdeveloped. Subsequent SFT and RLHF enhance the model's expressive capability, aligning it closely with the cognitive skills.
Figure 2: Progression of cognitive capability during the Pretraining stage in Baichuan-33B, as quantified by linear representations. The graph illustrates a stabilization in cognitive performance when the volume of training data reaches approximately 2.4T.
Figure 3: Illustration of the diminishing gap between expressive and cognitive capabilities in SFT and RLHF. Each SFT epoch processes 1 million tokens. The dotted line signifies the cognitive capability, established during the Pretraining phase and acting as the upper boundary for expressive capability. The solid line represents the expressive capability. The diagram highlights the gradual reduction of the disparity between these capabilities as the model undergoes further refinement through SFT and RLHF.
Figure 4: Convergence of cognitive capabilities by assessing consistency across consecutive checkpoints. The y-axis quantifies the discrepancy in judgments between each model and its predecessor. The red solid line is the average result.
Figure 5: UMAP visualization of HalluQA classifier in Baichuan-33B. The red line represents the delineation by SVM on neuron output while the green line represents that of token-level output. The blue dots and orange dots represents positive samples and negative samples in the datasets respectively. SFT and RLHF demonstrates the potential to align the expressive capability to the cognition capability in the fine-tuning stages.
...and 10 more figures

Theorems & Definitions (8)

Remark 3.1
Theorem 4.1
Remark 4.2
Lemma 3.1
Lemma 3.2
Lemma 3.3
Claim 3.4
Claim 3.5

Exploring the LLM Journey from Cognition to Expression with Linear Representations

TL;DR

Abstract

Exploring the LLM Journey from Cognition to Expression with Linear Representations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (8)