360Zhinao Technical Report
360Zhinao Team
TL;DR
The paper presents 360Zhinao-7B, a data-centric large language model with $L$-length context capabilities spanning $L\in\{4\mathrm{K},32\mathrm{K},360\mathrm{K}\}$, trained on $3.4\mathrm{T}$ tokens and released for open use. It emphasizes a rigorous data pipeline—comprising data preparation, cleaning, deduplication, and mixing—to maximize informational density while maintaining diversity, supported by a stable ablation environment and custom benchmarks (360Eval). The alignment process combines enhanced SFT data with long-context finetuning and RLHF using reward-model training, achieving task-specific gains and robust long-context behavior, including top performance on several benchmarks and near-perfect results on some long-document evaluations. Together, these contributions demonstrate a scalable, transparent approach to data-centric pretraining and long-context alignment, with practical implications for deploying long-context LLMs in real-world systems and open-source ecosystems.
Abstract
We present 360Zhinao models with 7B parameter size and context lengths spanning 4K, 32K and 360K, all available at https://github.com/Qihoo360/360zhinao. For rapid development in pretraining, we establish a stable and sensitive ablation environment to evaluate and compare experiment runs with minimal model size. Under such guidance, we perfect our data cleaning and composition strategies to pretrain $\texttt{360Zhinao-7B-Base}$ on 3.4T tokens. We also mainly emphasize data during alignment, where we strive to balance quantity and quality with filtering and reformatting. With tailored data, 360Zhinao-7B's context window is easily extended to 32K and 360K. RMs and RLHF are trained following SFT and credibly applied to specific tasks. All together these contributions lead to 360Zhinao-7B's competitive performance among models of similar size.
