Aquila2 Technical Report

Bo-Wen Zhang; Liangdong Wang; Jijie Li; Shuhao Gu; Xinya Wu; Zhengduo Zhang; Boyan Gao; Yulong Ao; Guang Liu

Aquila2 Technical Report

Bo-Wen Zhang, Liangdong Wang, Jijie Li, Shuhao Gu, Xinya Wu, Zhengduo Zhang, Boyan Gao, Yulong Ao, Guang Liu

TL;DR

Aquila2 tackles data-centric training for large bilingual language models by introducing the HeuriMentor framework, which couples an Adaptive Training Engine, a Training State Monitor, and a Data Management Unit to dynamically adjust data mixtures during training. The approach yields strong bilingual performance across English and Chinese for multiple model sizes (7B, 34B, 70B) and demonstrates robustness under quantization, while providing open-source training code and weights. Empirical results across 21 diverse benchmarks, along with extensive ablations in the appendix, show faster convergence, improved downstream competency, and competitive chat and multimodal capabilities. The work highlights the importance of data composition, real-time training feedback, and data management in scaling efficient, high-quality bilingual LLMs, and outlines future directions such as Mixture-of-Experts and further data-quality enhancements.

Abstract

This paper introduces the Aquila2 series, which comprises a wide range of bilingual models with parameter sizes of 7, 34, and 70 billion. These models are trained based on an innovative framework named HeuriMentor (HM), which offers real-time insights into model convergence and enhances the training process and data management. The HM System, comprising the Adaptive Training Engine (ATE), Training State Monitor (TSM), and Data Management Unit (DMU), allows for precise monitoring of the model's training progress and enables efficient optimization of data distribution, thereby enhancing training effectiveness. Extensive evaluations show that the Aquila2 model series performs comparably well on both English and Chinese benchmarks. Specifically, Aquila2-34B demonstrates only a slight decrease in performance when quantized to Int4. Furthermore, we have made our training code (https://github.com/FlagOpen/FlagScale) and model weights (https://github.com/FlagAI-Open/Aquila2) publicly available to support ongoing research and the development of applications.

Aquila2 Technical Report

TL;DR

Abstract

Paper Structure (25 sections, 1 equation, 21 figures, 14 tables)

This paper contains 25 sections, 1 equation, 21 figures, 14 tables.

Introduction
Aquila2 series
HeuriMentor Framework
Adaptive Training Engine
Training State Monitor (TSM)
Training loss
Downstream performance
Weight Trajectory
Data Management Unit (DMU)
Model evaluation
Overall results
Conclusion and Future Work
Limitation
Appendix
Alignment Evaluation
...and 10 more sections

Figures (21)

Figure 1: The HeuriMentor Framework structure.
Figure 2: Training loss for Aquila2-34B and Aquila2-70B-expr Models.
Figure 3: Performance of Aquila-34B (a) and Aquila-70B-expr (b) on downstream tasks during training. We use different colors to distinguish between different data stages of the training loss. We apply the score of HELM and LM evaluation as the main metrics. Details of evaluation are covered in section \ref{['evaluation metrics']}.
Figure 4: Convergence evidence from the perspective of the weight. Different colors of rectangles represent different data stages, corresponding to the stages(K6, K61&K62, K63, and K64) in Aquila2-34B, and the stages(K6, K61, K63, and K65) in Aquila-70B.
Figure 5: The proportions of different domains in K6-K65
...and 16 more figures

Aquila2 Technical Report

TL;DR

Abstract

Aquila2 Technical Report

Authors

TL;DR

Abstract

Table of Contents

Figures (21)