Table of Contents
Fetching ...

Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier

John Dang, Shivalika Singh, Daniel D'souza, Arash Ahmadian, Alejandro Salamanca, Madeline Smith, Aidan Peppin, Sungjin Hong, Manoj Govindassamy, Terrence Zhao, Sandra Kublik, Meor Amer, Viraat Aryabumi, Jon Ander Campos, Yi-Chern Tan, Tom Kocmi, Florian Strub, Nathan Grinsztajn, Yannis Flet-Berliac, Acyr Locatelli, Hangyu Lin, Dwarak Talupuru, Bharat Venkitesh, David Cairuz, Bowen Yang, Tim Chung, Wei-Yin Ko, Sylvie Shang Shi, Amir Shukayev, Sammie Bae, Aleksandra Piktus, Roman Castagné, Felipe Cruz-Salinas, Eddie Kim, Lucas Crawhall-Stein, Adrien Morisot, Sudip Roy, Phil Blunsom, Ivan Zhang, Aidan Gomez, Nick Frosst, Marzieh Fadaee, Beyza Ermis, Ahmet Üstün, Sara Hooker

TL;DR

The paper presents Aya Expanse, a multilingual 8B and 32B open-weight lineage designed to close the performance gap between multilingual and monolingual models. It introduces a unified post-training recipe combining data arbitrage, multilingual preference training, and model merging to achieve state-of-the-art multilingual performance across 23 languages. Extensive evaluations on m-ArenaHard, Dolly, and related benchmarks show Aya Expanse matching or surpassing several larger open-weight systems, with notable gains in open-ended generation, language understanding, mathematical reasoning, and translation. By releasing open-weights and the m-ArenaHard dataset, the work aims to accelerate progress toward more inclusive and capable multilingual AI systems.

Abstract

We introduce the Aya Expanse model family, a new generation of 8B and 32B parameter multilingual language models, aiming to address the critical challenge of developing highly performant multilingual models that match or surpass the capabilities of monolingual models. By leveraging several years of research at Cohere For AI and Cohere, including advancements in data arbitrage, multilingual preference training, and model merging, Aya Expanse sets a new state-of-the-art in multilingual performance. Our evaluations on the Arena-Hard-Auto dataset, translated into 23 languages, demonstrate that Aya Expanse 8B and 32B outperform leading open-weight models in their respective parameter classes, including Gemma 2, Qwen 2.5, and Llama 3.1, achieving up to a 76.6% win-rate. Notably, Aya Expanse 32B outperforms Llama 3.1 70B, a model with twice as many parameters, achieving a 54.0% win-rate. In this short technical report, we present extended evaluation results for the Aya Expanse model family and release their open-weights, together with a new multilingual evaluation dataset m-ArenaHard.

Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier

TL;DR

The paper presents Aya Expanse, a multilingual 8B and 32B open-weight lineage designed to close the performance gap between multilingual and monolingual models. It introduces a unified post-training recipe combining data arbitrage, multilingual preference training, and model merging to achieve state-of-the-art multilingual performance across 23 languages. Extensive evaluations on m-ArenaHard, Dolly, and related benchmarks show Aya Expanse matching or surpassing several larger open-weight systems, with notable gains in open-ended generation, language understanding, mathematical reasoning, and translation. By releasing open-weights and the m-ArenaHard dataset, the work aims to accelerate progress toward more inclusive and capable multilingual AI systems.

Abstract

We introduce the Aya Expanse model family, a new generation of 8B and 32B parameter multilingual language models, aiming to address the critical challenge of developing highly performant multilingual models that match or surpass the capabilities of monolingual models. By leveraging several years of research at Cohere For AI and Cohere, including advancements in data arbitrage, multilingual preference training, and model merging, Aya Expanse sets a new state-of-the-art in multilingual performance. Our evaluations on the Arena-Hard-Auto dataset, translated into 23 languages, demonstrate that Aya Expanse 8B and 32B outperform leading open-weight models in their respective parameter classes, including Gemma 2, Qwen 2.5, and Llama 3.1, achieving up to a 76.6% win-rate. Notably, Aya Expanse 32B outperforms Llama 3.1 70B, a model with twice as many parameters, achieving a 54.0% win-rate. In this short technical report, we present extended evaluation results for the Aya Expanse model family and release their open-weights, together with a new multilingual evaluation dataset m-ArenaHard.

Paper Structure

This paper contains 15 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Pairwise win-rates on m-ArenaHard averaged across 23 languages: We compare Aya Expanse 8B (left) with Gemma 2 9B, Llama 3.1 8B, Ministral 8B and Qwen 2.5 7B. Aya Expanse 32B (right) is compared with Gemma 2 27B, Qwen 2.5 32B, Mixtral 8x22B, and Llama 3.1 70B. We used the instruction-tuned version of all models.
  • Figure 2: Pairwise win-rates on Dolly evaluation set ayadata2024 averaged across 23 languages: We compare Aya Expanse 8B (left) with Gemma 2 9B, Llama 3.1 8B, Ministral 8B and Qwen 2.5 7B. Aya Expanse 32B (right) is compared with Gemma 2 27B, Qwen 2.5 32B, Mixtral 8x22B and Llama 3.1 70B. We used the instruct fine-tuned (via SFT and RLHF) version of all models.
  • Figure 3: Improvement with Aya Expanse models through Aya post-training recipe compared to their predecessor Aya 23 models: On the left, we show win-rates against Gemma 2 9B on m-ArenaHard for each post-training step of Aya Expanse 8B and compared with Aya 23 8B aryabumi2024aya. On the right, we compare Aya Expanse models with the previous Aya 23 model on academic benchmarks, showing significant improvement, especially for MGSM, Global-MMLU, and INCLUDE.
  • Figure 4: Language-specific win-rates on m-ArenaHard: Aya Expanse 8B performance against Gemma 2 9B for 8 diverse languages.