Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

Minghao Shao; Abdul Basit; Ramesh Karri; Muhammad Shafique

Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

Minghao Shao, Abdul Basit, Ramesh Karri, Muhammad Shafique

TL;DR

This survey analyzes the rapid evolution of transformer-based large language models (LLMs) and the burgeoning field of multimodal LLMs (MLLMs). It categorizes LLMs into encoder-only, decoder-only, and encoder-decoder families, and surveys pre-training and fine-tuning techniques, including parameter-efficient methods and mixture-of-experts approaches. The paper provides a comprehensive benchmarking panorama across language and multimodal tasks (e.g., MMLU, SuperGLUE, NLVR2, VQA), discusses data quality and bias, model compression, and distributed computation, and reviews leading models (e.g., GPT, PaLM, LLaMA, Gopher, PaLM-E, KOSMOS-1) and their multimodal extensions. Overall, it highlights trends toward efficiency, scalability, and safety, and outlines practical directions for robust, scalable, and trustworthy LLMs in diverse domains.

Abstract

Large Language Models (LLMs) represent a class of deep learning models adept at understanding natural language and generating coherent responses to various prompts or queries. These models far exceed the complexity of conventional neural networks, often encompassing dozens of neural network layers and containing billions to trillions of parameters. They are typically trained on vast datasets, utilizing architectures based on transformer blocks. Present-day LLMs are multi-functional, capable of performing a range of tasks from text generation and language translation to question answering, as well as code generation and analysis. An advanced subset of these models, known as Multimodal Large Language Models (MLLMs), extends LLM capabilities to process and interpret multiple data modalities, including images, audio, and video. This enhancement empowers MLLMs with capabilities like video editing, image comprehension, and captioning for visual content. This survey provides a comprehensive overview of the recent advancements in LLMs. We begin by tracing the evolution of LLMs and subsequently delve into the advent and nuances of MLLMs. We analyze emerging state-of-the-art MLLMs, exploring their technical features, strengths, and limitations. Additionally, we present a comparative analysis of these models and discuss their challenges, potential limitations, and prospects for future development.

Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

TL;DR

Abstract

Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (30)