Table of Contents
Fetching ...

Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions

Hadi Amini, Md Jueal Mia, Yasaman Saadati, Ahmed Imteaj, Seyedsina Nabavirazavi, Urmish Thakker, Md Zarif Hossain, Awal Ahmed Fime, S. S. Iyengar

TL;DR

The survey addresses how distributed and decentralized approaches can scale LLMs and multimodal LLMs (MLLMs) while preserving privacy and enabling edge deployment. It synthesizes advances across data collection, training, fine-tuning, RLHF, optimization, and deployment within FL and related distributed frameworks, highlighting both VLMs/MLLMs and SLMs. It categorizes literature into six focus areas—distributed training, inference/optimization, infrastructures, FL/fine-tuning, edge/mobile, and communication efficiency—and discusses gaps, challenges (privacy, heterogeneity, latency, and hallucination), and future research directions. The work also inventories foundational models and edge-ready SLMs, and points to practical resources (e.g., GitHub lists) to track evolving distributed MLLMs. Overall, it underscores the necessity of novel decentralization strategies to improve robustness, scalability, and privacy in distributed (M)LLMs for real-world applications.

Abstract

Language models (LMs) are machine learning models designed to predict linguistic patterns by estimating the probability of word sequences based on large-scale datasets, such as text. LMs have a wide range of applications in natural language processing (NLP) tasks, including autocomplete and machine translation. Although larger datasets typically enhance LM performance, scalability remains a challenge due to constraints in computational power and resources. Distributed computing strategies offer essential solutions for improving scalability and managing the growing computational demand. Further, the use of sensitive datasets in training and deployment raises significant privacy concerns. Recent research has focused on developing decentralized techniques to enable distributed training and inference while utilizing diverse computational resources and enabling edge AI. This paper presents a survey on distributed solutions for various LMs, including large language models (LLMs), vision language models (VLMs), multimodal LLMs (MLLMs), and small language models (SLMs). While LLMs focus on processing and generating text, MLLMs are designed to handle multiple modalities of data (e.g., text, images, and audio) and to integrate them for broader applications. To this end, this paper reviews key advancements across the MLLM pipeline, including distributed training, inference, fine-tuning, and deployment, while also identifying the contributions, limitations, and future areas of improvement. Further, it categorizes the literature based on six primary focus areas of decentralization. Our analysis describes gaps in current methodologies for enabling distributed solutions for LMs and outline future research directions, emphasizing the need for novel solutions to enhance the robustness and applicability of distributed LMs.

Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions

TL;DR

The survey addresses how distributed and decentralized approaches can scale LLMs and multimodal LLMs (MLLMs) while preserving privacy and enabling edge deployment. It synthesizes advances across data collection, training, fine-tuning, RLHF, optimization, and deployment within FL and related distributed frameworks, highlighting both VLMs/MLLMs and SLMs. It categorizes literature into six focus areas—distributed training, inference/optimization, infrastructures, FL/fine-tuning, edge/mobile, and communication efficiency—and discusses gaps, challenges (privacy, heterogeneity, latency, and hallucination), and future research directions. The work also inventories foundational models and edge-ready SLMs, and points to practical resources (e.g., GitHub lists) to track evolving distributed MLLMs. Overall, it underscores the necessity of novel decentralization strategies to improve robustness, scalability, and privacy in distributed (M)LLMs for real-world applications.

Abstract

Language models (LMs) are machine learning models designed to predict linguistic patterns by estimating the probability of word sequences based on large-scale datasets, such as text. LMs have a wide range of applications in natural language processing (NLP) tasks, including autocomplete and machine translation. Although larger datasets typically enhance LM performance, scalability remains a challenge due to constraints in computational power and resources. Distributed computing strategies offer essential solutions for improving scalability and managing the growing computational demand. Further, the use of sensitive datasets in training and deployment raises significant privacy concerns. Recent research has focused on developing decentralized techniques to enable distributed training and inference while utilizing diverse computational resources and enabling edge AI. This paper presents a survey on distributed solutions for various LMs, including large language models (LLMs), vision language models (VLMs), multimodal LLMs (MLLMs), and small language models (SLMs). While LLMs focus on processing and generating text, MLLMs are designed to handle multiple modalities of data (e.g., text, images, and audio) and to integrate them for broader applications. To this end, this paper reviews key advancements across the MLLM pipeline, including distributed training, inference, fine-tuning, and deployment, while also identifying the contributions, limitations, and future areas of improvement. Further, it categorizes the literature based on six primary focus areas of decentralization. Our analysis describes gaps in current methodologies for enabling distributed solutions for LMs and outline future research directions, emphasizing the need for novel solutions to enhance the robustness and applicability of distributed LMs.

Paper Structure

This paper contains 54 sections, 1 equation, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Overview of selected existing Survey papers in six categories and their overlapping areas.
  • Figure 2: Distribution of primary and secondary focus of papers [1]-[100] considering following categories: 1. Distributed Training; 2. Distributed Inference and Optimization; 3. Distributed Computing Infrastructures; 4. FL and Fine-tuning; 5. Edge Computing and Mobile Intelligence; 6. Communication Efficiency in Distributed Systems).
  • Figure 3: Overview of selected related works in six categories and their overlapping areas.
  • Figure 4: Overview of the LLM Pipeline.
  • Figure 5: A Framework for Federated MLLMs (partly inspired by lyu2023macaw).
  • ...and 6 more figures