AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds

Yinfang Chen; Manish Shetty; Gagan Somashekar; Minghua Ma; Yogesh Simmhan; Jonathan Mace; Chetan Bansal; Rujia Wang; Saravan Rajmohan

AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds

Yinfang Chen, Manish Shetty, Gagan Somashekar, Minghua Ma, Yogesh Simmhan, Jonathan Mace, Chetan Bansal, Rujia Wang, Saravan Rajmohan

TL;DR

AIOpsLab addresses the challenge of evaluating autonomous AI agents across the full incident lifecycle in cloud environments by providing a holistic benchmark framework that integrates cloud deployment, fault injection, workload generation, telemetry, and an Agent-Cloud Interface for interactive evaluation. It formalizes problems as $P = \langle T, C, S \rangle$ across four task levels—detection, localization, RCA, and mitigation—and instantiates 48 problems on DeathStarBench-based services using ChaosMesh for faults and Prometheus/Jaeger for observability. The framework introduces an orchestrated evaluation pipeline, problem initializers, and a rich observability stack, and benchmarks six agents (four LLM-based and three baselines) to reveal capabilities, limitations, and cost trade-offs, including step-limits and API usage challenges. The results highlight both the promise of AgentOps for end-to-end automation and the practical gaps in current agents, offering concrete guidance on task decomposition, data filtering, and robust interaction with cloud APIs. Overall, AIOpsLab provides a publicly available, extensible platform to accelerate development and rigorous evaluation of autonomous AIOps agents in realistic cloud environments, with broad implications for resilient, self-healing clouds.

Abstract

AI for IT Operations (AIOps) aims to automate complex operational tasks, such as fault localization and root cause analysis, to reduce human workload and minimize customer impact. While traditional DevOps tools and AIOps algorithms often focus on addressing isolated operational tasks, recent advances in Large Language Models (LLMs) and AI agents are revolutionizing AIOps by enabling end-to-end and multitask automation. This paper envisions a future where AI agents autonomously manage operational tasks throughout the entire incident lifecycle, leading to self-healing cloud systems, a paradigm we term AgentOps. Realizing this vision requires a comprehensive framework to guide the design, development, and evaluation of these agents. To this end, we present AIOPSLAB, a framework that not only deploys microservice cloud environments, injects faults, generates workloads, and exports telemetry data but also orchestrates these components and provides interfaces for interacting with and evaluating agents. We discuss the key requirements for such a holistic framework and demonstrate how AIOPSLAB can facilitate the evaluation of next-generation AIOps agents. Through evaluations of state-of-the-art LLM agents within the benchmark created by AIOPSLAB, we provide insights into their capabilities and limitations in handling complex operational tasks in cloud environments.

AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds

TL;DR

across four task levels—detection, localization, RCA, and mitigation—and instantiates 48 problems on DeathStarBench-based services using ChaosMesh for faults and Prometheus/Jaeger for observability. The framework introduces an orchestrated evaluation pipeline, problem initializers, and a rich observability stack, and benchmarks six agents (four LLM-based and three baselines) to reveal capabilities, limitations, and cost trade-offs, including step-limits and API usage challenges. The results highlight both the promise of AgentOps for end-to-end automation and the practical gaps in current agents, offering concrete guidance on task decomposition, data filtering, and robust interaction with cloud APIs. Overall, AIOpsLab provides a publicly available, extensible platform to accelerate development and rigorous evaluation of autonomous AIOps agents in realistic cloud environments, with broad implications for resilient, self-healing clouds.

Abstract

Paper Structure (27 sections, 7 figures, 5 tables)

This paper contains 27 sections, 7 figures, 5 tables.

Introduction
AIOpsLab
Problem Definition
Orchestrator
Agent Cloud Interface
Session Interface
Other Interfaces
Cloud Services
Task-oriented Fault Library
Task Taxonomy
Symptomatic Faults
Functional Faults
Observability
Evaluation
Evaluation Setup
...and 12 more sections

Figures (7)

Figure 1: Microservice incident and its management lifecycle.
Figure 2: Overview of AIOpsLab. The Orchestrator coordinates interactions between various system components and serves as the Agent-Cloud-Interface (ACI). Agents engage with the Orchestrator to solve tasks, receiving a problem description, instructions, and relevant APIs. The Orchestrator generates diverse problems using the Workload and Fault Generators, injecting these into applications it can deploy. The deployed service has observability, providing telemetry such as metrics, traces, and logs. Agents act via the Orchestrator, which executes them and updates the service's state. The Orchestrator evaluates the final solution using predefined metrics for the task.
Figure 3: Fault categories to instantiate problems in AIOpsLab.
Figure 4: Revoke authentication fault example. Injection happens at Mongodb-geo service, while Geo service will be abnormal and generate error logs.
Figure 5: Agent performance vs. number of steps taken.
...and 2 more figures

Theorems & Definitions (4)

Example 2.1
Example 2.2
Example 2.3
Example 2.4

AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds

TL;DR

Abstract

AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds

Authors

TL;DR

Abstract

Table of Contents

Figures (7)

Theorems & Definitions (4)