Table of Contents
Fetching ...

Baichuan Alignment Technical Report

Mingan Lin, Fan Yang, Yanjun Shen, Haoze Sun, Tianpeng Li, Tao Zhang, Chenzheng Zhu, Tao Zhang, Miao Zheng, Xu Li, Yijie Zhou, Mingyang Chen, Yanzhao Qin, Youquan Li, Hao Liang, Fei Li, Yadong Li, Mang Wang, Guosheng Dong, Kun Fang, Jianhua Xu, Bin Cui, Wentao Zhang, Zenan Zhou, Weipeng Chen

TL;DR

Baichuan Alignment presents a comprehensive, publicly accessible account of alignment techniques applied to the Baichuan model series, detailing PAS, SFT, and Preference Alignment. It combines data-centric pipelines, optimization strategies, and evaluation frameworks to demonstrate robust improvements across internal and open-source benchmarks. Key contributions include novel data construction via prompt systems, prompt quality evaluation, and a suite of efficiency techniques (packing, gradient checkpointing, sequence parallel) plus model merging and PAS-driven augmentation. The reported gains in user experience, instruction following, math and reasoning, and open-source benchmark standings illustrate substantial practical impact and progress toward scalable alignment for AGI-relevant systems. Overall, the report aims to inform and accelerate community progress by sharing challenges, solutions, and empirical lessons learned during alignment.

Abstract

We introduce Baichuan Alignment, a detailed analysis of the alignment techniques employed in the Baichuan series of models. This represents the industry's first comprehensive account of alignment methodologies, offering valuable insights for advancing AI research. We investigate the critical components that enhance model performance during the alignment process, including optimization methods, data strategies, capability enhancements, and evaluation processes. The process spans three key stages: Prompt Augmentation System(PAS), Supervised Fine-Tuning(SFT), and Preference Alignment. The problems encountered, the solutions applied, and the improvements made are thoroughly recorded. Through comparisons across well-established benchmarks, we highlight the technological advancements enabled by Baichuan Alignment. Baichuan-Instruct is an internal model, while Qwen2-Nova-72B and Llama3-PBM-Nova-70B are instruct versions of the Qwen2-72B and Llama-3-70B base models, optimized through Baichuan Alignment. Baichuan-Instruct demonstrates significant improvements in core capabilities, with user experience gains ranging from 17% to 28%, and performs exceptionally well on specialized benchmarks. In open-source benchmark evaluations, both Qwen2-Nova-72B and Llama3-PBM-Nova-70B consistently outperform their respective official instruct versions across nearly all datasets. This report aims to clarify the key technologies behind the alignment process, fostering a deeper understanding within the community. Llama3-PBM-Nova-70B model is available at https://huggingface.co/PKU-Baichuan-MLSystemLab/Llama3-PBM-Nova-70B.

Baichuan Alignment Technical Report

TL;DR

Baichuan Alignment presents a comprehensive, publicly accessible account of alignment techniques applied to the Baichuan model series, detailing PAS, SFT, and Preference Alignment. It combines data-centric pipelines, optimization strategies, and evaluation frameworks to demonstrate robust improvements across internal and open-source benchmarks. Key contributions include novel data construction via prompt systems, prompt quality evaluation, and a suite of efficiency techniques (packing, gradient checkpointing, sequence parallel) plus model merging and PAS-driven augmentation. The reported gains in user experience, instruction following, math and reasoning, and open-source benchmark standings illustrate substantial practical impact and progress toward scalable alignment for AGI-relevant systems. Overall, the report aims to inform and accelerate community progress by sharing challenges, solutions, and empirical lessons learned during alignment.

Abstract

We introduce Baichuan Alignment, a detailed analysis of the alignment techniques employed in the Baichuan series of models. This represents the industry's first comprehensive account of alignment methodologies, offering valuable insights for advancing AI research. We investigate the critical components that enhance model performance during the alignment process, including optimization methods, data strategies, capability enhancements, and evaluation processes. The process spans three key stages: Prompt Augmentation System(PAS), Supervised Fine-Tuning(SFT), and Preference Alignment. The problems encountered, the solutions applied, and the improvements made are thoroughly recorded. Through comparisons across well-established benchmarks, we highlight the technological advancements enabled by Baichuan Alignment. Baichuan-Instruct is an internal model, while Qwen2-Nova-72B and Llama3-PBM-Nova-70B are instruct versions of the Qwen2-72B and Llama-3-70B base models, optimized through Baichuan Alignment. Baichuan-Instruct demonstrates significant improvements in core capabilities, with user experience gains ranging from 17% to 28%, and performs exceptionally well on specialized benchmarks. In open-source benchmark evaluations, both Qwen2-Nova-72B and Llama3-PBM-Nova-70B consistently outperform their respective official instruct versions across nearly all datasets. This report aims to clarify the key technologies behind the alignment process, fostering a deeper understanding within the community. Llama3-PBM-Nova-70B model is available at https://huggingface.co/PKU-Baichuan-MLSystemLab/Llama3-PBM-Nova-70B.

Paper Structure

This paper contains 69 sections, 3 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Performance Comparison of Qwen2-Nova-72B and Llama3-PBM-Nova-70B with Others
  • Figure 2: Difference between packing on sample and pakcing on batch.
  • Figure 3: We present Prompt Augmentation System (PAS). (a) It takes user prompts, enhances them, and inputs the augmented prompts into LLMs. (b) PAS significantly improves responses across all categories in human evaluation.
  • Figure 4: The Pipeline of Alignment Data Processes, including: Prompt System and Classification, Prompt Selection, and Construction of Response and Preference Data
  • Figure 5: Overview of Instruction-Following Optimization, including: System Message, Constraint Expansion, Response Reversal, and Textbook Techniques.
  • ...and 3 more figures