Joint Partitioning and Placement of Foundation Models for Real-Time Edge AI

Aladin Djuhera; Fernando Koch; Alecio Binotto

Joint Partitioning and Placement of Foundation Models for Real-Time Edge AI

Aladin Djuhera, Fernando Koch, Alecio Binotto

TL;DR

This paper tackles real-time edge inference of large foundation models by enabling joint, runtime partitioning and placement across heterogeneous edge resources. It introduces an adaptive orchestration framework that profiles capacity, re-partitions the LFM graph at runtime, and enforces privacy constraints through selective local execution. The approach is formalized as a constrained optimization over partitions and placements with a modular architecture (monitoring, decision-making, graph re-splitting, and reconfiguration broadcast) and is demonstrated in a 6G/MEC scenario, showing substantial latency and utilization improvements with modest overhead. The framework is designed to be integrable with existing orchestration stacks and extensible to future AI-native scheduling goals.

Abstract

Inference over large-scale foundation models within heterogeneous edge environments necessitates a fundamentally reconfigurable orchestration substrate. Static partitioning of model layers presumes temporal stability across compute and network resources, which is misaligned with the volatility of real-world deployments. We introduce a framework in which both the spatial placement and internal segmentation of foundation models are elevated to runtime-resolved constructs. The orchestration problem is formalized as a constrained optimization over layer-wise assignments, subject to evolving latency, utilization, and privacy gradients. The framework implements reactive inference composition responsive to infrastructural fluctuations by integrating model-aware capacity profiling with dynamic graph re-partitioning and reallocation. We introduce architectural and algorithmic components, along with a representative use case in 6G multi-access edge computing.

Joint Partitioning and Placement of Foundation Models for Real-Time Edge AI

TL;DR

Abstract

Joint Partitioning and Placement of Foundation Models for Real-Time Edge AI

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)