Table of Contents
Fetching ...

AgentCgroup: Understanding and Controlling OS Resources of AI Agents

Yusheng Zheng, Jiakun Fan, Quanzhi Fu, Yiwei Yang, Wei Zhang, Andi Quinn

TL;DR

The paper tackles the problem of OS-level resource management for AI coding agents operating in sandboxed, multi-tenant cloud environments. It conducts a systematic characterization of resource dynamics across 144 SWE-rebench tasks using two LLM backends, revealing that OS-level execution accounts for the majority of latency and that memory is the principal bottleneck during concurrency, with highly bursty, tool-call-driven memory usage. Based on these findings, it proposes AgentCgroup, an eBPF-based controller that enforces fine-grained, tool-call-aligned resource domains via in-kernel scheduling and memory throttling, coupled with runtime-adaptive policies. Preliminary evaluation demonstrates improved multi-tenant isolation and reduced resource waste, highlighting the potential for kernel-level controls to address granularity, responsiveness, and adaptability gaps in existing resource management approaches for AI agents.

Abstract

AI agents are increasingly deployed in multi-tenant cloud environments, where they execute diverse tool calls within sandboxed containers, each call with distinct resource demands and rapid fluctuations. We present a systematic characterization of OS-level resource dynamics in sandboxed AI coding agents, analyzing 144 software engineering tasks from the SWE-rebench benchmark across two LLM models. Our measurements reveal that (1) OS-level execution (tool calls, container and agent initialization) accounts for 56-74% of end-to-end task latency; (2) memory, not CPU, is the concurrency bottleneck; (3) memory spikes are tool-call-driven with a up to 15.4x peak-to-average ratio; and (4) resource demands are highly unpredictable across tasks, runs, and models. Comparing these characteristics against serverless, microservice, and batch workloads, we identify three mismatches in existing resource controls: a granularity mismatch (container-level policies vs. tool-call-level dynamics), a responsiveness mismatch (user-space reaction vs. sub-second unpredictable bursts), and an adaptability mismatch (history-based prediction vs. non-deterministic stateful execution). We propose AgentCgroup , an eBPF-based resource controller that addresses these mismatches through hierarchical cgroup structures aligned with tool-call boundaries, in-kernel enforcement via sched_ext and memcg_bpf_ops, and runtime-adaptive policies driven by in-kernel monitoring. Preliminary evaluation demonstrates improved multi-tenant isolation and reduced resource waste.

AgentCgroup: Understanding and Controlling OS Resources of AI Agents

TL;DR

The paper tackles the problem of OS-level resource management for AI coding agents operating in sandboxed, multi-tenant cloud environments. It conducts a systematic characterization of resource dynamics across 144 SWE-rebench tasks using two LLM backends, revealing that OS-level execution accounts for the majority of latency and that memory is the principal bottleneck during concurrency, with highly bursty, tool-call-driven memory usage. Based on these findings, it proposes AgentCgroup, an eBPF-based controller that enforces fine-grained, tool-call-aligned resource domains via in-kernel scheduling and memory throttling, coupled with runtime-adaptive policies. Preliminary evaluation demonstrates improved multi-tenant isolation and reduced resource waste, highlighting the potential for kernel-level controls to address granularity, responsiveness, and adaptability gaps in existing resource management approaches for AI agents.

Abstract

AI agents are increasingly deployed in multi-tenant cloud environments, where they execute diverse tool calls within sandboxed containers, each call with distinct resource demands and rapid fluctuations. We present a systematic characterization of OS-level resource dynamics in sandboxed AI coding agents, analyzing 144 software engineering tasks from the SWE-rebench benchmark across two LLM models. Our measurements reveal that (1) OS-level execution (tool calls, container and agent initialization) accounts for 56-74% of end-to-end task latency; (2) memory, not CPU, is the concurrency bottleneck; (3) memory spikes are tool-call-driven with a up to 15.4x peak-to-average ratio; and (4) resource demands are highly unpredictable across tasks, runs, and models. Comparing these characteristics against serverless, microservice, and batch workloads, we identify three mismatches in existing resource controls: a granularity mismatch (container-level policies vs. tool-call-level dynamics), a responsiveness mismatch (user-space reaction vs. sub-second unpredictable bursts), and an adaptability mismatch (history-based prediction vs. non-deterministic stateful execution). We propose AgentCgroup , an eBPF-based resource controller that addresses these mismatches through hierarchical cgroup structures aligned with tool-call boundaries, in-kernel enforcement via sched_ext and memcg_bpf_ops, and runtime-adaptive policies driven by in-kernel monitoring. Preliminary evaluation demonstrates improved multi-tenant isolation and reduced resource waste.
Paper Structure (14 sections, 8 figures, 2 tables)

This paper contains 14 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Task execution time distribution (a) and execution phase division (b).
  • Figure 2: Tool execution time distribution (a) and Bash command semantic category proportion (b), GLM agent.
  • Figure 3: Tool time proportion distribution (a) and tool call distribution over execution progress (b), all 144 tasks.
  • Figure 4: Docker image size distribution (a) and aggregated memory trajectory (b), all 144 tasks.
  • Figure 5: Resource usage time series: Haiku agent executing pre-commit/pre-commit#2524.
  • ...and 3 more figures