IntentContinuum: Using LLMs to Support Intent-Based Computing Across the Compute Continuum
Negin Akbari, John Grundy, Aamir Cheema, Adel N. Toosi
TL;DR
This work tackles the challenge of real-time resource management across the edge-cloud compute continuum to satisfy user-defined intents and SLOs in IoT/AI workloads. It introduces IntentContinuum, a framework that centers a GPT-4o-driven Decision-Maker to monitor, diagnose, and automatically reconfigure compute and networking resources, enabling robust intent satisfaction through actions like service placement, horizontal/vertical scaling, and flow scheduling. The system ingests structured inputs (cluster, network, monitoring) and uses templates and few-shot prompts to perform root-cause analysis and propose corrective actions, with violations tracked via the exponential moving average of response time ($EMA_{RT}$). Evaluations on an iContinuum-based testbed show IntentContinuum can outperform Kubernetes autoscalers in maintaining intents, handle network-induced violations, and scale across multiple worker nodes, while highlighting tradeoffs in model dependence, transparency, and token costs. The work contributes an open-source prototype, demonstrates end-to-end intent-driven management across compute and network resources, and outlines future work to enhance transparency, efficiency, and scalability for broader adoption.
Abstract
The increasing proliferation of IoT devices and AI applications has created a demand for scalable and efficient computing solutions, particularly for applications requiring real-time processing. The compute continuum integrates edge and cloud resources to meet this need, balancing the low-latency demands of the edge with the high computational power of the cloud. However, managing resources in such a distributed environment presents challenges due to the diversity and complexity of these systems. Traditional resource management methods, often relying on heuristic algorithms, struggle to manage the increasing complexity, scale, and dynamics of these systems, as well as adapt to dynamic workloads and changing network conditions. Moreover, designing such approaches is often time-intensive and highly tailored to specific applications, demanding deep expertise. In this paper, we introduce a novel framework for intent-driven resource management in the compute continuum, using large language models (LLMs) to help automate decision-making processes. Our framework ensures that user-defined intents -- such as achieving the required response times for time-critical applications -- are consistently fulfilled. In the event of an intent violation, our system performs root cause analysis by examining system data to identify and address issues. This approach reduces the need for human intervention and enhances system reliability, offering a more dynamic and efficient solution for resource management in distributed environments.
