A Blueprint Architecture of Compound AI Systems for Enterprise
Eser Kandogan, Sajjadur Rahman, Nikita Bhutani, Dan Zhang, Rafael Li Chen, Kushan Mitra, Sairam Gurajada, Pouya Pezeshkpour, Hayate Iso, Yanlin Feng, Hannah Kim, Chen Shen, Jin Wang, Estevam Hruschka
TL;DR
The paper tackles operationalizing LLMs in enterprise settings by shifting from monolithic models to compound AI systems that combine models, data, and tools. It proposes a blueprint architecture featuring agent and data registries, stream-based orchestration, and task/data planners to manage latency, accuracy, and cost within production constraints. Key contributions include explicit touchpoints for integration, an event-driven streams layer, and planning components for task decomposition and data retrieval. This approach aims to enable scalable, controllable, and cost-effective AI workflows integrated with existing enterprise infrastructure.
Abstract
Large Language Models (LLMs) have showcased remarkable capabilities surpassing conventional NLP challenges, creating opportunities for use in production use cases. Towards this goal, there is a notable shift to building compound AI systems, wherein LLMs are integrated into an expansive software infrastructure with many components like models, retrievers, databases and tools. In this paper, we introduce a blueprint architecture for compound AI systems to operate in enterprise settings cost-effectively and feasibly. Our proposed architecture aims for seamless integration with existing compute and data infrastructure, with ``stream'' serving as the key orchestration concept to coordinate data and instructions among agents and other components. Task and data planners, respectively, break down, map, and optimize tasks and data to available agents and data sources defined in respective registries, given production constraints such as accuracy and latency.
