Table of Contents
Fetching ...

Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese

Zhuosheng Zhang, Hanqing Zhang, Keming Chen, Yuhang Guo, Jingyun Hua, Yulong Wang, Ming Zhou

TL;DR

The paper tackles the resource intensity of Chinese pretrained language models by introducing Mengzi, a lightweight 103M-parameter family built on a RoBERTa-compatible backbone. It shows that careful pre-training objective design (including POS/NE, SOP, and dynamic gradient correction) and robust fine-tuning strategies (KD, transfer learning, choice smoothing, adversarial training, and data augmentation) can significantly boost performance without increasing model size. Mengzi achieves strong results on CLUE benchmarks and competitive performance against much larger models, while also extending to domain-specific (financial) and multimodal (vision-language) variants. The work provides public releases (Mengzi-BERT-base, Mengzi-T5-base, Mengzi-BERT-base-fin, Mengzi-Oscar-base) and practical usage guidance, enabling rapid deployment in industry and academia.

Abstract

Although pre-trained models (PLMs) have achieved remarkable improvements in a wide range of NLP tasks, they are expensive in terms of time and resources. This calls for the study of training more efficient models with less computation but still ensures impressive performance. Instead of pursuing a larger scale, we are committed to developing lightweight yet more powerful models trained with equal or less computation and friendly to rapid deployment. This technical report releases our pre-trained model called Mengzi, which stands for a family of discriminative, generative, domain-specific, and multimodal pre-trained model variants, capable of a wide range of language and vision tasks. Compared with public Chinese PLMs, Mengzi is simple but more powerful. Our lightweight model has achieved new state-of-the-art results on the widely-used CLUE benchmark with our optimized pre-training and fine-tuning techniques. Without modifying the model architecture, our model can be easily employed as an alternative to existing PLMs. Our sources are available at https://github.com/Langboat/Mengzi.

Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese

TL;DR

The paper tackles the resource intensity of Chinese pretrained language models by introducing Mengzi, a lightweight 103M-parameter family built on a RoBERTa-compatible backbone. It shows that careful pre-training objective design (including POS/NE, SOP, and dynamic gradient correction) and robust fine-tuning strategies (KD, transfer learning, choice smoothing, adversarial training, and data augmentation) can significantly boost performance without increasing model size. Mengzi achieves strong results on CLUE benchmarks and competitive performance against much larger models, while also extending to domain-specific (financial) and multimodal (vision-language) variants. The work provides public releases (Mengzi-BERT-base, Mengzi-T5-base, Mengzi-BERT-base-fin, Mengzi-Oscar-base) and practical usage guidance, enabling rapid deployment in industry and academia.

Abstract

Although pre-trained models (PLMs) have achieved remarkable improvements in a wide range of NLP tasks, they are expensive in terms of time and resources. This calls for the study of training more efficient models with less computation but still ensures impressive performance. Instead of pursuing a larger scale, we are committed to developing lightweight yet more powerful models trained with equal or less computation and friendly to rapid deployment. This technical report releases our pre-trained model called Mengzi, which stands for a family of discriminative, generative, domain-specific, and multimodal pre-trained model variants, capable of a wide range of language and vision tasks. Compared with public Chinese PLMs, Mengzi is simple but more powerful. Our lightweight model has achieved new state-of-the-art results on the widely-used CLUE benchmark with our optimized pre-training and fine-tuning techniques. Without modifying the model architecture, our model can be easily employed as an alternative to existing PLMs. Our sources are available at https://github.com/Langboat/Mengzi.

Paper Structure

This paper contains 33 sections, 3 figures, 6 tables.

Figures (3)

  • Figure 1: The family of Mengzi models. Mengzi-BERT-base-fin, Mengzi-T5-base, and Mengzi-Oscar-base are derivatives of Mengzi-BERT-base.
  • Figure 2: Generated marketing copywriting examples from Mengzi-T5-base and GPT.
  • Figure 3: Generated caption examples from Mengzi-Oscar-base and PowerPoint (Randomly selected from the AIC-ICC val set).