Aquila2 Technical Report
Bo-Wen Zhang, Liangdong Wang, Jijie Li, Shuhao Gu, Xinya Wu, Zhengduo Zhang, Boyan Gao, Yulong Ao, Guang Liu
TL;DR
Aquila2 tackles data-centric training for large bilingual language models by introducing the HeuriMentor framework, which couples an Adaptive Training Engine, a Training State Monitor, and a Data Management Unit to dynamically adjust data mixtures during training. The approach yields strong bilingual performance across English and Chinese for multiple model sizes (7B, 34B, 70B) and demonstrates robustness under quantization, while providing open-source training code and weights. Empirical results across 21 diverse benchmarks, along with extensive ablations in the appendix, show faster convergence, improved downstream competency, and competitive chat and multimodal capabilities. The work highlights the importance of data composition, real-time training feedback, and data management in scaling efficient, high-quality bilingual LLMs, and outlines future directions such as Mixture-of-Experts and further data-quality enhancements.
Abstract
This paper introduces the Aquila2 series, which comprises a wide range of bilingual models with parameter sizes of 7, 34, and 70 billion. These models are trained based on an innovative framework named HeuriMentor (HM), which offers real-time insights into model convergence and enhances the training process and data management. The HM System, comprising the Adaptive Training Engine (ATE), Training State Monitor (TSM), and Data Management Unit (DMU), allows for precise monitoring of the model's training progress and enables efficient optimization of data distribution, thereby enhancing training effectiveness. Extensive evaluations show that the Aquila2 model series performs comparably well on both English and Chinese benchmarks. Specifically, Aquila2-34B demonstrates only a slight decrease in performance when quantized to Int4. Furthermore, we have made our training code (https://github.com/FlagOpen/FlagScale) and model weights (https://github.com/FlagAI-Open/Aquila2) publicly available to support ongoing research and the development of applications.
