Table of Contents
Fetching ...

Nyonic Technical Report

Junfeng Tian, Rui Wang, Cong Li, Yudong Zhou, Jun Liu, Jun Wang

TL;DR

This work presents Wonton 7B, a pre-trained base model and a chat-tuned variant, built on a PyTorch framework with RoPE, QK-LayerNorm, and a multilingual tokenizer. It introduces an Online Data Scheduler enabling online data streaming, curriculum-driven data mixing, real-time feedback, and efficient resumption, paired with a robust data engineering pipeline and a high-capacity training infrastructure. The model achieves competitive results on standard and multilingual benchmarks, supported by extensive inference and deployment workflows, including Hugging Face integration, TensorRT optimization, and Aliyun EAS hosting. The authors outline ongoing plans to release additional checkpoints and fine-tuned variants, aiming to close gaps with larger models and broaden real-world applicability through open collaboration and deployment tooling.

Abstract

This report details the development and key achievements of our latest language model designed for custom large language models. The advancements introduced include a novel Online Data Scheduler that supports flexible training data adjustments and curriculum learning. The model's architecture is fortified with state-of-the-art techniques such as Rotary Positional Embeddings, QK-LayerNorm, and a specially crafted multilingual tokenizer to enhance stability and performance. Moreover, our robust training framework incorporates advanced monitoring and rapid recovery features to ensure optimal efficiency. Our Wonton 7B model has demonstrated competitive performance on a range of multilingual and English benchmarks. Future developments will prioritize narrowing the performance gap with more extensively trained models, thereby enhancing the model's real-world efficacy and adaptability.GitHub: \url{https://github.com/nyonicai/nyonic-public}

Nyonic Technical Report

TL;DR

This work presents Wonton 7B, a pre-trained base model and a chat-tuned variant, built on a PyTorch framework with RoPE, QK-LayerNorm, and a multilingual tokenizer. It introduces an Online Data Scheduler enabling online data streaming, curriculum-driven data mixing, real-time feedback, and efficient resumption, paired with a robust data engineering pipeline and a high-capacity training infrastructure. The model achieves competitive results on standard and multilingual benchmarks, supported by extensive inference and deployment workflows, including Hugging Face integration, TensorRT optimization, and Aliyun EAS hosting. The authors outline ongoing plans to release additional checkpoints and fine-tuned variants, aiming to close gaps with larger models and broaden real-world applicability through open collaboration and deployment tooling.

Abstract

This report details the development and key achievements of our latest language model designed for custom large language models. The advancements introduced include a novel Online Data Scheduler that supports flexible training data adjustments and curriculum learning. The model's architecture is fortified with state-of-the-art techniques such as Rotary Positional Embeddings, QK-LayerNorm, and a specially crafted multilingual tokenizer to enhance stability and performance. Moreover, our robust training framework incorporates advanced monitoring and rapid recovery features to ensure optimal efficiency. Our Wonton 7B model has demonstrated competitive performance on a range of multilingual and English benchmarks. Future developments will prioritize narrowing the performance gap with more extensively trained models, thereby enhancing the model's real-world efficacy and adaptability.GitHub: \url{https://github.com/nyonicai/nyonic-public}
Paper Structure (24 sections, 4 figures, 4 tables)

This paper contains 24 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Language and data proportions in the training set.
  • Figure 2: The data flow architecture of the Online Data Scheduler.
  • Figure 3: Comparative analysis of multilingual tokenization metrics across different languages.
  • Figure 4: Performance comparison of Wonton 7B with other open-source models on the Belebele benchmarks.