Vision-Language Models for Edge Networks: A Comprehensive Survey

Ahmed Sharshar; Latif U. Khan; Waseem Ullah; Mohsen Guizani

Vision-Language Models for Edge Networks: A Comprehensive Survey

Ahmed Sharshar, Latif U. Khan, Waseem Ullah, Mohsen Guizani

TL;DR

This survey addresses the challenge of deploying Vision-Language Models on resource-constrained edge devices by surveying lightweight architectures, compression techniques (pruning, quantization, knowledge distillation), and efficient fine-tuning (prompts, adapters). It covers edge deployment pipelines, data handling, model partitioning between edge and cloud, and privacy/security considerations, with examples spanning healthcare, environmental monitoring, autonomous systems, and surveillance. Key contributions include a taxonomy of edge-focused VLM design choices, deployment strategies, and a discussion of open challenges (security, privacy, cross-modality learning, and communication). The work highlights practical implications for real-time, on-device multimodal processing and outlines future directions, including federated and context-aware learning, hardware-aware architectures, and robust edge ecosystems.

Abstract

Vision Large Language Models (VLMs) combine visual understanding with natural language processing, enabling tasks like image captioning, visual question answering, and video analysis. While VLMs show impressive capabilities across domains such as autonomous vehicles, smart surveillance, and healthcare, their deployment on resource-constrained edge devices remains challenging due to processing power, memory, and energy limitations. This survey explores recent advancements in optimizing VLMs for edge environments, focusing on model compression techniques, including pruning, quantization, knowledge distillation, and specialized hardware solutions that enhance efficiency. We provide a detailed discussion of efficient training and fine-tuning methods, edge deployment challenges, and privacy considerations. Additionally, we discuss the diverse applications of lightweight VLMs across healthcare, environmental monitoring, and autonomous systems, illustrating their growing impact. By highlighting key design strategies, current challenges, and offering recommendations for future directions, this survey aims to inspire further research into the practical deployment of VLMs, ultimately making advanced AI accessible in resource-limited settings.

Vision-Language Models for Edge Networks: A Comprehensive Survey

TL;DR

Abstract

Vision-Language Models for Edge Networks: A Comprehensive Survey

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)