Improving Coherence and Persistence in Agentic AI for System Optimization

Pantea Karimi; Kimia Noorbakhsh; Mohammad Alizadeh; Hari Balakrishnan

Improving Coherence and Persistence in Agentic AI for System Optimization

Pantea Karimi, Kimia Noorbakhsh, Mohammad Alizadeh, Hari Balakrishnan

Abstract

Designing high-performance system heuristics is a creative, iterative process requiring experts to form hypotheses and execute multi-step conceptual shifts. While Large Language Models (LLMs) show promise in automating this loop, they struggle with complex system problems due to two critical failure modes: evolutionary neighborhood bias and the coherence ceiling. Evolutionary methods often remain trapped in local optima by relying on scalar benchmark scores, failing when coordinated multi-step changes are required. Conversely, existing agentic frameworks suffer from context degradation over long horizons or fail to accumulate knowledge across independent runs. We present Engram, an agentic researcher architecture that addresses these limitations by decoupling long-horizon exploration from the constraints of a single context window. Engram organizes exploration into a sequence of agents that iteratively design, test, and analyze mechanisms. At the conclusion of each run, an agent stores code snapshots, logs, and results in a persistent Archive and distills high-level modeling insights into a compact, persistent Research Digest. Subsequent agents then begin with a fresh context window, reading the Research Digest to build on prior discoveries. We find that Engram exhibits superior performance across diverse domains including multi-cloud multicast, LLM inference request routing, and optimizing KV cache reuse in databases with natural language queries.

Improving Coherence and Persistence in Agentic AI for System Optimization

Abstract

Paper Structure (29 sections, 10 equations, 25 figures, 4 tables)

This paper contains 29 sections, 10 equations, 25 figures, 4 tables.

Introduction
Why LLMs Struggle on System Optimization Problems
Evolution via code mutation.
Iterative design with reasoning and flexible tools.
Engram Design
Single Agent Exploration
The Agent Handoff
Case Studies
Case Study: Multi-Cloud Multicast
Case Study: LLM Request Routing
Case Study: Optimizing KV Cache Reuse in Databases with Natural Language Queries
Additional Evaluation
Other Related Work
LLMs for systems research.
Conclusion
...and 14 more sections

Figures (25)

Figure 1: The three paradigms for LLM-based heuristic design. Evolutionary approaches with code mutation invoke an LLM with a predefined context format, mutating and selecting candidates based on scalar scores. Iterative design with flexible tool access (e.g., Glia) performs coherent experiment-guided exploration, but each exploration is restricted to a bounded LLM context window. Engram combines agent explorations with a shared research digest that persists insights across explorations (\ref{['sec:design']}), imporving persistence while preserving long-horizon coherence and flexibility.
Figure 2: Engram's design is based on a sequence of reasoning-based agent explorations that produce and evaluate ideas based on hypotheses driven from experimental data analysis. Each agent begins by analyzing the problem and reviewing the research digest summarizing the findings of the previous agents, using that information to formulate its own exploration and experimentation plan. The agent executes this plan through design, experimentation, and analysis. Upon completion, it writes a summary of its findings to the research digest, stores all the details in the Archive, and hands of the research process to the next agent. The process typically terminates when the research budget is exhausted.
Figure 3: Workspace of a single agent in Engram.
Figure 4: Multi-cloud data replication from a source (purple) to destinations (blue) across the world, either directly or via a waypoint (yellow) to avoid expensive or slow links.
Figure 5: comparison for multi-cloud multicast with "Direction" prompt and o3 (lower is better). Engram achieves the strongest average best cost, outperforming both evolutionary approaches (EoH, FunSearch, OpenEvolve) and Glia. The whiskers show 90% confidence intervals.
...and 20 more figures

Improving Coherence and Persistence in Agentic AI for System Optimization

Abstract

Improving Coherence and Persistence in Agentic AI for System Optimization

Authors

Abstract

Table of Contents

Figures (25)