Table of Contents
Fetching ...

Bailong: Bilingual Transfer Learning based on QLoRA and Zip-tie Embedding

Lung-Chuan Chen, Zong-Ru Li

TL;DR

Bailong targets cross-lingual transfer to Traditional Chinese by combining vocabulary expansion, zip-tie embedding initialization, and parameter-efficient tuning via QLoRA. The approach enables efficient continual pre-training on a Traditional Chinese–heavy corpus with reduced trainable parameters (~4.6% of total) and improved encoding through a 27k-token TC vocabulary extension. It further enhances alignment through supervised fine-tuning on a diverse instruction dataset, producing Bailong-instruct 7B and a dedicated Bailong-bench for evaluating instruction-following and multi-turn dialogue. Results show competitive Traditional Chinese performance against larger or fully fine-tuned models, with Bailong-bench providing a more robust open-ended evaluation framework. The work demonstrates a scalable pathway to democratize TC LLM development and can generalize to other low-resource languages.

Abstract

Large language models (LLMs) have demonstrated exceptional performance in various NLP applications. However, the majority of existing open-source LLMs are pre-trained primarily on English data and little part of other languages. This deficiency in multilingual training data results in suboptimal performance when applied to languages with fewer available resources. Furthermore, enhancing the performance of LLMs on low-resource languages by full-parameter fine-tuning with additional data requires substantial computational resources, posing computational barriers for research organizations and individual researchers. Consequently, several techniques such as parameter-efficient tuning and advanced embedding initialization have been proposed to address these challenges. In this work, we combine them to facilitate cross-lingual transfer on English-dominated open-source LLM. To effectively enhance the model's proficiency in Traditional Chinese, we conduct secondary pre-training on Llama 2 7B with Traditional Chinese data by leveraging QLoRA and our proposed zip-tie embedding initialization. The resulting model called Bailong, which stands for Bilingual trAnsfer learnIng based on qLOra and zip-tie embeddiNG. We present Bailong-instruct 7B, a fine-tuned version of Bailong 7B optimized for multi-turn dialogue scenarios. Recognizing the inadequacy of benchmark datasets in Traditional Chinese, we further introduce Bailong-bench to assess the alignment of models with human preferences and the capability to follow instructions in both Traditional Chinese and English tasks. In our evaluation, Bailong-instruct 7B exhibits competitive performance on Bailong-bench and other benchmark datasets when compared to other open-source models of similar or even larger parameter sizes. Bailong-instruct 7B and Bailong-bench are publicly available with the aim of empowering the community to build upon our efforts.

Bailong: Bilingual Transfer Learning based on QLoRA and Zip-tie Embedding

TL;DR

Bailong targets cross-lingual transfer to Traditional Chinese by combining vocabulary expansion, zip-tie embedding initialization, and parameter-efficient tuning via QLoRA. The approach enables efficient continual pre-training on a Traditional Chinese–heavy corpus with reduced trainable parameters (~4.6% of total) and improved encoding through a 27k-token TC vocabulary extension. It further enhances alignment through supervised fine-tuning on a diverse instruction dataset, producing Bailong-instruct 7B and a dedicated Bailong-bench for evaluating instruction-following and multi-turn dialogue. Results show competitive Traditional Chinese performance against larger or fully fine-tuned models, with Bailong-bench providing a more robust open-ended evaluation framework. The work demonstrates a scalable pathway to democratize TC LLM development and can generalize to other low-resource languages.

Abstract

Large language models (LLMs) have demonstrated exceptional performance in various NLP applications. However, the majority of existing open-source LLMs are pre-trained primarily on English data and little part of other languages. This deficiency in multilingual training data results in suboptimal performance when applied to languages with fewer available resources. Furthermore, enhancing the performance of LLMs on low-resource languages by full-parameter fine-tuning with additional data requires substantial computational resources, posing computational barriers for research organizations and individual researchers. Consequently, several techniques such as parameter-efficient tuning and advanced embedding initialization have been proposed to address these challenges. In this work, we combine them to facilitate cross-lingual transfer on English-dominated open-source LLM. To effectively enhance the model's proficiency in Traditional Chinese, we conduct secondary pre-training on Llama 2 7B with Traditional Chinese data by leveraging QLoRA and our proposed zip-tie embedding initialization. The resulting model called Bailong, which stands for Bilingual trAnsfer learnIng based on qLOra and zip-tie embeddiNG. We present Bailong-instruct 7B, a fine-tuned version of Bailong 7B optimized for multi-turn dialogue scenarios. Recognizing the inadequacy of benchmark datasets in Traditional Chinese, we further introduce Bailong-bench to assess the alignment of models with human preferences and the capability to follow instructions in both Traditional Chinese and English tasks. In our evaluation, Bailong-instruct 7B exhibits competitive performance on Bailong-bench and other benchmark datasets when compared to other open-source models of similar or even larger parameter sizes. Bailong-instruct 7B and Bailong-bench are publicly available with the aim of empowering the community to build upon our efforts.
Paper Structure (40 sections, 9 equations, 5 figures, 7 tables)

This paper contains 40 sections, 9 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Losses over training tokens from training Llama 2 7B and Llama 2 13B on Trditional Chinese subset of Wikipedia. The evaluation is performed every 1.3 million trained tokens.
  • Figure 2: The prompt template for single-turn answer grading.
  • Figure 3: The prompt template for single-turn answer grading provided with reference answer.
  • Figure 4: The prompt template for multi-turn answer grading.
  • Figure 5: The prompt template for MT-bench.