Alignment at Pre-training! Towards Native Alignment for Arabic LLMs

Juhao Liang; Zhenyang Cai; Jianqing Zhu; Huang Huang; Kewei Zong; Bang An; Mosen Alharthi; Juncai He; Lian Zhang; Haizhou Li; Benyou Wang; Jinchao Xu

Alignment at Pre-training! Towards Native Alignment for Arabic LLMs

Juhao Liang, Zhenyang Cai, Jianqing Zhu, Huang Huang, Kewei Zong, Bang An, Mosen Alharthi, Juncai He, Lian Zhang, Haizhou Li, Benyou Wang, Jinchao Xu

TL;DR

This work introduces native alignment, a data-centric approach that performs alignment during the pre-training phase to reduce unaligned content from the start. Focusing on Arabic LLMs, it details a four-step data processing workflow (deduplication, annotation, training alignment workers, rewriting) guided by a polishing code of conduct, and demonstrates toxicity and fluency improvements in preliminary analyses. Empirical results show state-of-the-art performance for open-source Arabic models (LLaMA3-Tamed-70B and 8B) on ArabicMMLU, EXAMS, ACVA, and AraTrust benchmarks, with enhanced harmlessness and helpfulness on the BeaverTails evaluation. The authors also compare native alignment to conventional pre-training and data cleaning, revealing mutual benefits and favorable scaling behavior as alignment data increases. Overall, native alignment offers a scalable, cost-effective path to safer, more culturally aligned Arabic LLMs and is accompanied by open-source models to benefit the community.

Abstract

The alignment of large language models (LLMs) is critical for developing effective and safe language models. Traditional approaches focus on aligning models during the instruction tuning or reinforcement learning stages, referred to in this paper as `post alignment'. We argue that alignment during the pre-training phase, which we term `native alignment', warrants investigation. Native alignment aims to prevent unaligned content from the beginning, rather than relying on post-hoc processing. This approach leverages extensively aligned pre-training data to enhance the effectiveness and usability of pre-trained models. Our study specifically explores the application of native alignment in the context of Arabic LLMs. We conduct comprehensive experiments and ablation studies to evaluate the impact of native alignment on model performance and alignment stability. Additionally, we release open-source Arabic LLMs that demonstrate state-of-the-art performance on various benchmarks, providing significant benefits to the Arabic LLM community.

Alignment at Pre-training! Towards Native Alignment for Arabic LLMs

TL;DR

Abstract

Alignment at Pre-training! Towards Native Alignment for Arabic LLMs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)