Embodied Foundation Models at the Edge: A Survey of Deployment Constraints and Mitigation Strategies

Utkarsh Grover; Ravi Ranjan; Mingyang Mao; Trung Tien Dong; Satvik Praveen; Zhenqi Wu; J. Morris Chang; Tinoosh Mohsenin; Yi Sheng; Agoritsa Polyzou; Eiman Kanjo; Xiaomin Lin

Embodied Foundation Models at the Edge: A Survey of Deployment Constraints and Mitigation Strategies

Utkarsh Grover, Ravi Ranjan, Mingyang Mao, Trung Tien Dong, Satvik Praveen, Zhenqi Wu, J. Morris Chang, Tinoosh Mohsenin, Yi Sheng, Agoritsa Polyzou, Eiman Kanjo, Xiaomin Lin

Abstract

Deploying foundation models in embodied edge systems is fundamentally a systems problem, not just a problem of model compression. Real-time control must operate within strict size, weight, and power constraints, where memory traffic, compute latency, timing variability, and safety margins interact directly. The Deployment Gauntlet organizes these constraints into eight coupled barriers that determine whether embodied foundation models can run reliably in practice. Across representative edge workloads, autoregressive Vision-Language-Action policies are constrained primarily by memory bandwidth, whereas diffusion-based controllers are limited more by compute latency and sustained execution cost. Reliable deployment therefore depends on system-level co-design across memory, scheduling, communication, and model architecture, including decompositions that separate fast control from slower semantic reasoning.

Embodied Foundation Models at the Edge: A Survey of Deployment Constraints and Mitigation Strategies

Abstract

Paper Structure (67 sections, 4 figures, 5 tables)

This paper contains 67 sections, 4 figures, 5 tables.

Introduction
The Multimodal Execution Imperative
The Deployment Gauntlet
Landscape of Foundation Models on the Edge
Taxonomy of Edge-Relevant Workloads
Vision-Language-Action (VLA) Policies
Diffusion-Based Policies
Vision Encoders and Multimodal LMMs
3D & LiDAR Encoders
Multimodal Fusion Stacks
Section summary.
The Deployment Gauntlet
The Sensor Fusion Tax
Temporal and Spatial Misalignment
Middleware and Pipeline Overheads
...and 52 more sections

Figures (4)

Figure 1: The Deployment Gauntlet. A unified view of the eight major system barriers that limit the deployment of foundation models from the cloud to edge embodied AI platforms.
Figure 2: The taxonomy for the gauntlets and solutions in Foundation Models under Embodied Constraints.
Figure 3: The taxonomy for Mitigation Strategies for Deployment Gauntlet on edge
Figure 4: Architecture trends for future Embodied Foundation Models: Data volume is minimized via Neuromorphic and event sensing (A), and near sensor compute mechanism (B). A central predictive world model stabilizes perception (C) while cognition is separated into fast reflexes and slow reasoning (D). The rigorous safety policy(E) enforces constraints before actuation. And the Fleet learning (F) extends the feasible complexity.

Embodied Foundation Models at the Edge: A Survey of Deployment Constraints and Mitigation Strategies

Abstract

Embodied Foundation Models at the Edge: A Survey of Deployment Constraints and Mitigation Strategies

Authors

Abstract

Table of Contents

Figures (4)