Large Multimodal Models for Low-Resource Languages: A Survey
Marian Lupascu, Ana-Cristina Rogoz, Mihai Sorin Stupariu, Radu Tudor Ionescu
TL;DR
This survey analyzes how large multimodal models are being adapted for low-resource languages by synthesizing 117 studies across 96 languages and proposing a six-category taxonomy (data creation, synthetic data, fusion, visual enhancement, cross-modal transfer, and architectural innovations). It finds text-image pairs dominate the field, with uneven language coverage and increasing attention to data resources, fusion strategies, and efficient architectures, while highlighting persistent evaluation and governance challenges. Visual information frequently benefits LR multimodal tasks, yet issues such as hallucination and computational constraints limit broader adoption. The work offers practical guidelines, an open-source repository, and future directions to promote equitable, community-centered progress in LR multimodal NLP.
Abstract
In this survey, we systematically analyze techniques used to adapt large multimodal models (LMMs) for low-resource (LR) languages, examining approaches ranging from visual enhancement and data creation to cross-modal transfer and fusion strategies. Through a comprehensive analysis of 117 studies across 96 LR languages, we identify key patterns in how researchers tackle the challenges of limited data and computational resources. We categorize works into resource-oriented and method-oriented contributions, further dividing contributions into relevant sub-categories. We compare method-oriented contributions in terms of performance and efficiency, discussing benefits and limitations of representative studies. We find that visual information often serves as a crucial bridge for improving model performance in LR settings, though significant challenges remain in areas such as hallucination mitigation and computational efficiency. In summary, we provide researchers with a clear understanding of current approaches and remaining challenges in making LMMs more accessible to speakers of LR (understudied) languages. We complement our survey with an open-source repository available at: https://github.com/marianlupascu/LMM4LRL-Survey.
