Table of Contents
Fetching ...

A Survey on Post-training of Large Language Models

Guiyao Tie, Zeli Zhao, Dingjie Song, Fuyang Wei, Rong Zhou, Yurou Dai, Wen Yin, Zhejian Yang, Jiangyue Yan, Yao Su, Zhenhan Dai, Yifeng Xie, Yihan Cao, Lichao Sun, Pan Zhou, Lifang He, Hechang Chen, Yu Zhang, Qingsong Wen, Tianming Liu, Neil Zhenqiang Gong, Jiliang Tang, Caiming Xiong, Heng Ji, Philip S. Yu, Jianfeng Gao

TL;DR

This survey addresses the challenge of converting general pre-trained language models into task- and domain-aware systems through post-training language models (PoLMs). It synthesizes five core paradigms—Fine-Tuning, Alignment, Reasoning, Efficiency, and Integration and Adaptation—tracing their evolution from early RLHF-aligned systems to modern large-reasoning models. Key contributions include a structured taxonomy of techniques, a historical synthesis from ChatGPT-era alignment to DeepSeek-R1-era reasoning, and a roadmap for LRMs that balance reasoning proficiency with ethical robustness and domain flexibility. The work also catalogs datasets and applications, and identifies open problems—scalability, bias, and multimodal coherence—as critical avenues for future research. Collectively, the survey establishes an intellectual framework to guide the development of more precise, trustworthy, and versatile post-trained language systems across scientific and societal contexts.

Abstract

The emergence of Large Language Models (LLMs) has fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration. However, their pre-trained architectures often reveal limitations in specialized contexts, including restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance. These challenges necessitate advanced post-training language models (PoLMs) to address these shortcomings, such as OpenAI-o1/o3 and DeepSeek-R1 (collectively known as Large Reasoning Models, or LRMs). This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures ethical coherence and alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Efficiency, which optimizes resource utilization amidst increasing complexity; Integration and Adaptation, which extend capabilities across diverse modalities while addressing coherence issues. Charting progress from ChatGPT's alignment strategies to DeepSeek-R1's innovative reasoning advancements, we illustrate how PoLMs leverage datasets to mitigate biases, deepen reasoning capabilities, and enhance domain adaptability. Our contributions include a pioneering synthesis of PoLM evolution, a structured taxonomy categorizing techniques and datasets, and a strategic agenda emphasizing the role of LRMs in improving reasoning proficiency and domain flexibility. As the first survey of its scope, this work consolidates recent PoLM advancements and establishes a rigorous intellectual framework for future research, fostering the development of LLMs that excel in precision, ethical robustness, and versatility across scientific and societal applications.

A Survey on Post-training of Large Language Models

TL;DR

This survey addresses the challenge of converting general pre-trained language models into task- and domain-aware systems through post-training language models (PoLMs). It synthesizes five core paradigms—Fine-Tuning, Alignment, Reasoning, Efficiency, and Integration and Adaptation—tracing their evolution from early RLHF-aligned systems to modern large-reasoning models. Key contributions include a structured taxonomy of techniques, a historical synthesis from ChatGPT-era alignment to DeepSeek-R1-era reasoning, and a roadmap for LRMs that balance reasoning proficiency with ethical robustness and domain flexibility. The work also catalogs datasets and applications, and identifies open problems—scalability, bias, and multimodal coherence—as critical avenues for future research. Collectively, the survey establishes an intellectual framework to guide the development of more precise, trustworthy, and versatile post-trained language systems across scientific and societal contexts.

Abstract

The emergence of Large Language Models (LLMs) has fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration. However, their pre-trained architectures often reveal limitations in specialized contexts, including restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance. These challenges necessitate advanced post-training language models (PoLMs) to address these shortcomings, such as OpenAI-o1/o3 and DeepSeek-R1 (collectively known as Large Reasoning Models, or LRMs). This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures ethical coherence and alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Efficiency, which optimizes resource utilization amidst increasing complexity; Integration and Adaptation, which extend capabilities across diverse modalities while addressing coherence issues. Charting progress from ChatGPT's alignment strategies to DeepSeek-R1's innovative reasoning advancements, we illustrate how PoLMs leverage datasets to mitigate biases, deepen reasoning capabilities, and enhance domain adaptability. Our contributions include a pioneering synthesis of PoLM evolution, a structured taxonomy categorizing techniques and datasets, and a strategic agenda emphasizing the role of LRMs in improving reasoning proficiency and domain flexibility. As the first survey of its scope, this work consolidates recent PoLM advancements and establishes a rigorous intellectual framework for future research, fostering the development of LLMs that excel in precision, ethical robustness, and versatility across scientific and societal applications.

Paper Structure

This paper contains 70 sections, 38 equations, 21 figures, 9 tables.

Figures (21)

  • Figure : The evolution of post-training techniques for Large Language Models, delineating the progression from initial methodologies to advanced approaches, with emphasis on DeepSeek model contributions (highlighted in blue).
  • Figure : Structural overview of post-training techniques surveyed in this study, illustrating the organization of methodologies, datasets, and applications.
  • Figure : Timeline of post-training technique development for Large Language Models (2018–2025), delineating key milestones in their historical progression.
  • Figure : Process of Supervised Fine-Tuning.
  • Figure : Workflow of Instruction Fine-tuning, illustrating the general pipeline for Instruction Dataset Construction and Instrction Tuning in Large Language Models.
  • ...and 16 more figures