Table of Contents
Fetching ...

360Zhinao Technical Report

360Zhinao Team

TL;DR

The paper presents 360Zhinao-7B, a data-centric large language model with $L$-length context capabilities spanning $L\in\{4\mathrm{K},32\mathrm{K},360\mathrm{K}\}$, trained on $3.4\mathrm{T}$ tokens and released for open use. It emphasizes a rigorous data pipeline—comprising data preparation, cleaning, deduplication, and mixing—to maximize informational density while maintaining diversity, supported by a stable ablation environment and custom benchmarks (360Eval). The alignment process combines enhanced SFT data with long-context finetuning and RLHF using reward-model training, achieving task-specific gains and robust long-context behavior, including top performance on several benchmarks and near-perfect results on some long-document evaluations. Together, these contributions demonstrate a scalable, transparent approach to data-centric pretraining and long-context alignment, with practical implications for deploying long-context LLMs in real-world systems and open-source ecosystems.

Abstract

We present 360Zhinao models with 7B parameter size and context lengths spanning 4K, 32K and 360K, all available at https://github.com/Qihoo360/360zhinao. For rapid development in pretraining, we establish a stable and sensitive ablation environment to evaluate and compare experiment runs with minimal model size. Under such guidance, we perfect our data cleaning and composition strategies to pretrain $\texttt{360Zhinao-7B-Base}$ on 3.4T tokens. We also mainly emphasize data during alignment, where we strive to balance quantity and quality with filtering and reformatting. With tailored data, 360Zhinao-7B's context window is easily extended to 32K and 360K. RMs and RLHF are trained following SFT and credibly applied to specific tasks. All together these contributions lead to 360Zhinao-7B's competitive performance among models of similar size.

360Zhinao Technical Report

TL;DR

The paper presents 360Zhinao-7B, a data-centric large language model with -length context capabilities spanning , trained on tokens and released for open use. It emphasizes a rigorous data pipeline—comprising data preparation, cleaning, deduplication, and mixing—to maximize informational density while maintaining diversity, supported by a stable ablation environment and custom benchmarks (360Eval). The alignment process combines enhanced SFT data with long-context finetuning and RLHF using reward-model training, achieving task-specific gains and robust long-context behavior, including top performance on several benchmarks and near-perfect results on some long-document evaluations. Together, these contributions demonstrate a scalable, transparent approach to data-centric pretraining and long-context alignment, with practical implications for deploying long-context LLMs in real-world systems and open-source ecosystems.

Abstract

We present 360Zhinao models with 7B parameter size and context lengths spanning 4K, 32K and 360K, all available at https://github.com/Qihoo360/360zhinao. For rapid development in pretraining, we establish a stable and sensitive ablation environment to evaluate and compare experiment runs with minimal model size. Under such guidance, we perfect our data cleaning and composition strategies to pretrain on 3.4T tokens. We also mainly emphasize data during alignment, where we strive to balance quantity and quality with filtering and reformatting. With tailored data, 360Zhinao-7B's context window is easily extended to 32K and 360K. RMs and RLHF are trained following SFT and credibly applied to specific tasks. All together these contributions lead to 360Zhinao-7B's competitive performance among models of similar size.
Paper Structure (36 sections, 7 figures, 10 tables)

This paper contains 36 sections, 7 figures, 10 tables.

Figures (7)

  • Figure 1: The cascaded retention changes in the web page pipeline.
  • Figure 2: The comparison of compression rates among tokenizers.
  • Figure 3: Results on the internal evaluation set of 360Zhinao-7B-Chat and Qwen-7B-Chat. 360Zhinao-7B-Chat outperforms Qwen-7B-Chat on most prompt categories.
  • Figure 4: From left to right: results on random city-number value retrieval niah2023v1, original NIAH niah2023v0 and Chinese NIAH constructed by us. Value retrieval is relatively easy.
  • Figure 5: Loss curves of Document deduplication.
  • ...and 2 more figures