Table of Contents
Fetching ...

A Blueprint Architecture of Compound AI Systems for Enterprise

Eser Kandogan, Sajjadur Rahman, Nikita Bhutani, Dan Zhang, Rafael Li Chen, Kushan Mitra, Sairam Gurajada, Pouya Pezeshkpour, Hayate Iso, Yanlin Feng, Hannah Kim, Chen Shen, Jin Wang, Estevam Hruschka

TL;DR

The paper tackles operationalizing LLMs in enterprise settings by shifting from monolithic models to compound AI systems that combine models, data, and tools. It proposes a blueprint architecture featuring agent and data registries, stream-based orchestration, and task/data planners to manage latency, accuracy, and cost within production constraints. Key contributions include explicit touchpoints for integration, an event-driven streams layer, and planning components for task decomposition and data retrieval. This approach aims to enable scalable, controllable, and cost-effective AI workflows integrated with existing enterprise infrastructure.

Abstract

Large Language Models (LLMs) have showcased remarkable capabilities surpassing conventional NLP challenges, creating opportunities for use in production use cases. Towards this goal, there is a notable shift to building compound AI systems, wherein LLMs are integrated into an expansive software infrastructure with many components like models, retrievers, databases and tools. In this paper, we introduce a blueprint architecture for compound AI systems to operate in enterprise settings cost-effectively and feasibly. Our proposed architecture aims for seamless integration with existing compute and data infrastructure, with ``stream'' serving as the key orchestration concept to coordinate data and instructions among agents and other components. Task and data planners, respectively, break down, map, and optimize tasks and data to available agents and data sources defined in respective registries, given production constraints such as accuracy and latency.

A Blueprint Architecture of Compound AI Systems for Enterprise

TL;DR

The paper tackles operationalizing LLMs in enterprise settings by shifting from monolithic models to compound AI systems that combine models, data, and tools. It proposes a blueprint architecture featuring agent and data registries, stream-based orchestration, and task/data planners to manage latency, accuracy, and cost within production constraints. Key contributions include explicit touchpoints for integration, an event-driven streams layer, and planning components for task decomposition and data retrieval. This approach aims to enable scalable, controllable, and cost-effective AI workflows integrated with existing enterprise infrastructure.

Abstract

Large Language Models (LLMs) have showcased remarkable capabilities surpassing conventional NLP challenges, creating opportunities for use in production use cases. Towards this goal, there is a notable shift to building compound AI systems, wherein LLMs are integrated into an expansive software infrastructure with many components like models, retrievers, databases and tools. In this paper, we introduce a blueprint architecture for compound AI systems to operate in enterprise settings cost-effectively and feasibly. Our proposed architecture aims for seamless integration with existing compute and data infrastructure, with ``stream'' serving as the key orchestration concept to coordinate data and instructions among agents and other components. Task and data planners, respectively, break down, map, and optimize tasks and data to available agents and data sources defined in respective registries, given production constraints such as accuracy and latency.
Paper Structure (6 sections, 3 figures)

This paper contains 6 sections, 3 figures.

Figures (3)

  • Figure 1: Blueprint Architecture: Data and Agent Registries are touch points that define existing data, models, APIs, and services in the enterprise for utilization by agents.
  • Figure 2: Agents: Triggered by data/instruction messages from multiple incoming streams agents process and produce output data and instructions to multiple output streams.
  • Figure 3: Orchestration of Agents: As agents join a session and generate output streams, these events are broadcasted in session streams. Other agent may choose to respond to a stream, and initiate a worker to process data in the stream, interacting with external services and databases. Computation occurs in various layers for optimal utilization.