Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks
Fakhraddin Alwajih, El Moatez Billah Nagoudi, Gagan Bhatia, Abdelrahman Mohamed, Muhammad Abdul-Mageed
TL;DR
Peacock tackles the scarcity of Arabic multimodal resources by introducing two architecture variants (InstructBlip-based and LLaVA-based) that fuse vision encoders with Arabic LLMs, backed by a two-stage training pipeline that translates and filters English image-text data for Arabic use. It also provides AraLLaMA, a high-quality Arabic-adapted LLaMA2-7B backbone, and introduces Henna, a culturally-focused benchmark, plus an Egyptian-dialect case study to probe dialectal capabilities. Through comprehensive evaluations on VQA, LLaVA-Bench, SEED-Bench (Arabic), Henna, and dialect tasks, Peacock consistently outperforms multilingual baselines like mBlip, highlighting the impact of data quality, architecture choices, and Arabic-specific adaptation. The work establishes strong baselines and resources for Arabic vision-language modeling, enabling future research and culturally-aware applications in the Arabic-speaking world.
Abstract
Multimodal large language models (MLLMs) have proven effective in a wide range of tasks requiring complex reasoning and linguistic comprehension. However, due to a lack of high-quality multimodal resources in languages other than English, success of MLLMs remains relatively limited to English-based settings. This poses significant challenges in developing comparable models for other languages, including even those with large speaker populations such as Arabic. To alleviate this challenge, we introduce a comprehensive family of Arabic MLLMs, dubbed \textit{Peacock}, with strong vision and language capabilities. Through comprehensive qualitative and quantitative analysis, we demonstrate the solid performance of our models on various visual reasoning tasks and further show their emerging dialectal potential. Additionally, we introduce ~\textit{Henna}, a new benchmark specifically designed for assessing MLLMs on aspects related to Arabic culture, setting the first stone for culturally-aware Arabic MLLMs.The GitHub repository for the \textit{Peacock} project is available at \url{https://github.com/UBC-NLP/peacock}.
