Table of Contents
Fetching ...

Towards Resource-Efficient Compound AI Systems

Gohar Irfan Chaudhry, Esha Choukse, Íñigo Goiri, Rodrigo Fonseca, Adam Belay, Ricardo Bianchini

TL;DR

This paper addresses resource inefficiency in Compound AI Systems caused by cross-layer coupling by proposing Murakkab, a system that combines a declarative, fungible workflow language with an adaptive runtime that unifies workflow orchestration with cluster management. The approach enables dynamic model/tool/hardware selection and resource-aware scheduling to multiplex resources and reduce waste, validated by a prototype that delivers up to $3.4\times$ faster workflow completion and $4.5\times$ greater energy efficiency. Key contributions include the declarative workflow model, the adaptive runtime, and the demonstration of integrated orchestration with cluster management, plus a discussion of overheads and practical considerations like AIWaaS and multi-cloud deployment. If generalized, this framework could significantly lower operational costs and improve sustainability for complex AI pipelines while preserving result quality.

Abstract

Compound AI Systems, integrating multiple interacting components like models, retrievers, and external tools, have emerged as essential for addressing complex AI tasks. However, current implementations suffer from inefficient resource utilization due to tight coupling between application logic and execution details, a disconnect between orchestration and resource management layers, and the perceived exclusiveness between efficiency and quality. We propose a vision for resource-efficient Compound AI Systems through a declarative workflow programming model and an adaptive runtime system for dynamic scheduling and resource-aware decision-making. Decoupling application logic from low-level details exposes levers for the runtime to flexibly configure the execution environment and resources, without compromising on quality. Enabling collaboration between the workflow orchestration and cluster manager enables higher efficiency through better scheduling and resource management. We are building a prototype system, called Murakkab, to realize this vision. Our preliminary evaluation demonstrates speedups up to $\sim 3.4\times$ in workflow completion times while delivering $\sim 4.5\times$ higher energy efficiency, showing promise in optimizing resources and advancing AI system design.

Towards Resource-Efficient Compound AI Systems

TL;DR

This paper addresses resource inefficiency in Compound AI Systems caused by cross-layer coupling by proposing Murakkab, a system that combines a declarative, fungible workflow language with an adaptive runtime that unifies workflow orchestration with cluster management. The approach enables dynamic model/tool/hardware selection and resource-aware scheduling to multiplex resources and reduce waste, validated by a prototype that delivers up to faster workflow completion and greater energy efficiency. Key contributions include the declarative workflow model, the adaptive runtime, and the demonstration of integrated orchestration with cluster management, plus a discussion of overheads and practical considerations like AIWaaS and multi-cloud deployment. If generalized, this framework could significantly lower operational costs and improve sustainability for complex AI pipelines while preserving result quality.

Abstract

Compound AI Systems, integrating multiple interacting components like models, retrievers, and external tools, have emerged as essential for addressing complex AI tasks. However, current implementations suffer from inefficient resource utilization due to tight coupling between application logic and execution details, a disconnect between orchestration and resource management layers, and the perceived exclusiveness between efficiency and quality. We propose a vision for resource-efficient Compound AI Systems through a declarative workflow programming model and an adaptive runtime system for dynamic scheduling and resource-aware decision-making. Decoupling application logic from low-level details exposes levers for the runtime to flexibly configure the execution environment and resources, without compromising on quality. Enabling collaboration between the workflow orchestration and cluster manager enables higher efficiency through better scheduling and resource management. We are building a prototype system, called Murakkab, to realize this vision. Our preliminary evaluation demonstrates speedups up to in workflow completion times while delivering higher energy efficiency, showing promise in optimizing resources and advancing AI system design.

Paper Structure

This paper contains 9 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Today programmers use frameworks to call agents from different providers hosted on multiple cloud platforms. The rigid coupling between all layers of the system results in inefficiencies.
  • Figure 2: We envision fungible workflows with high-level descriptions, managed jointly by the Workflow Orchestrator and Cluster Manager. This allows higher resource multiplexing between independent workflows to improve efficiency.
  • Figure 3: Execution traces of the Video Understanding workflow. Murakkab can adjust between multiple configurations and deliver a $\sim 3.4\times$ speedup with higher resource efficiency.