Table of Contents
Fetching ...

PHOENIX: Open-Source Language Adaption for Direct Preference Optimization

Matthias Uhlig, Sigurd Schacht, Sudarshan Kamath Barkur

TL;DR

This work addresses the underrepresentation of non-English languages in large language models by developing a German-focused adaptation of a Mistral-based base using a pipeline that combines open-source translation with supervised fine-tuning and Direct Preference Optimization (DPO). The authors translate large-scale instruction data with ALMA via vLLM to enable low-cost, license-compliant German data, then fine-tune the German model and apply DPO to align responses to human preferences. Evaluation on German MT-Bench and cross-language benchmarks shows Phoenix achieving competitive results relative to larger models, notably outperforming some English-lean baselines on specific tasks and demonstrating Open-Source translation-driven efficiency. The work highlights a practical, cost-effective pathway for multilingual LLM development and underscores the need for richer multilingual evaluation benchmarks and data pipelines.

Abstract

Large language models have gained immense importance in recent years and have demonstrated outstanding results in solving various tasks. However, despite these achievements, many questions remain unanswered in the context of large language models. Besides the optimal use of the models for inference and the alignment of the results to the desired specifications, the transfer of models to other languages is still an underdeveloped area of research. The recent publication of models such as Llama-2 and Zephyr has provided new insights into architectural improvements and the use of human feedback. However, insights into adapting these techniques to other languages remain scarce. In this paper, we build on latest improvements and apply the Direct Preference Optimization(DPO) approach to the German language. The model is available at https://huggingface.co/DRXD1000/Phoenix.

PHOENIX: Open-Source Language Adaption for Direct Preference Optimization

TL;DR

This work addresses the underrepresentation of non-English languages in large language models by developing a German-focused adaptation of a Mistral-based base using a pipeline that combines open-source translation with supervised fine-tuning and Direct Preference Optimization (DPO). The authors translate large-scale instruction data with ALMA via vLLM to enable low-cost, license-compliant German data, then fine-tune the German model and apply DPO to align responses to human preferences. Evaluation on German MT-Bench and cross-language benchmarks shows Phoenix achieving competitive results relative to larger models, notably outperforming some English-lean baselines on specific tasks and demonstrating Open-Source translation-driven efficiency. The work highlights a practical, cost-effective pathway for multilingual LLM development and underscores the need for richer multilingual evaluation benchmarks and data pipelines.

Abstract

Large language models have gained immense importance in recent years and have demonstrated outstanding results in solving various tasks. However, despite these achievements, many questions remain unanswered in the context of large language models. Besides the optimal use of the models for inference and the alignment of the results to the desired specifications, the transfer of models to other languages is still an underdeveloped area of research. The recent publication of models such as Llama-2 and Zephyr has provided new insights into architectural improvements and the use of human feedback. However, insights into adapting these techniques to other languages remain scarce. In this paper, we build on latest improvements and apply the Direct Preference Optimization(DPO) approach to the German language. The model is available at https://huggingface.co/DRXD1000/Phoenix.
Paper Structure (9 sections, 1 figure, 5 tables)

This paper contains 9 sections, 1 figure, 5 tables.

Figures (1)

  • Figure 1: The COAI Playground with the outputs for the two models side by side. The Mixtral MoE model is on the left and the Phoenix model is on the right.