Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards

Shirley Wu; Parth Sarthi; Shiyu Zhao; Aaron Lee; Herumb Shandilya; Adrian Mladenic Grobelnik; Nurendra Choudhary; Eddie Huang; Karthik Subbian; Linjun Zhang; Diyi Yang; James Zou; Jure Leskovec

Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards

Shirley Wu, Parth Sarthi, Shiyu Zhao, Aaron Lee, Herumb Shandilya, Adrian Mladenic Grobelnik, Nurendra Choudhary, Eddie Huang, Karthik Subbian, Linjun Zhang, Diyi Yang, James Zou, Jure Leskovec

TL;DR

Optimas tackles the optimization of compound AI systems by introducing globally aligned Local Reward Functions (LRFs) that supervise per-component updates. Built on a shared LLM backbone with component-specific heads, LRFs are adapted online to preserve alignment with the global objective, enabling independent optimization across heterogeneous configurations. The framework provides theoretical convergence guarantees and demonstrates robust, data-efficient gains across five real-world tasks, outperforming baselines by about 11.92% on average. This approach offers scalable, interpretable, and resource-efficient optimization for complex AI pipelines with broad practical impact.

Abstract

Compound AI systems integrating multiple components, such as Large Language Models, specialized tools, and traditional machine learning models, are increasingly deployed to solve complex real-world tasks. However, optimizing compound systems remains challenging due to their non-differentiable structures and diverse configuration types across components, including prompts, hyperparameters, and model parameters. To address this challenge, we propose Optimas, a unified framework for effective optimization of compound systems. The core idea of Optimas is to maintain one Local Reward Function (LRF) per component, each satisfying a local-global alignment property, i.e., each component's local reward correlates with the global system performance. In each iteration, Optimas efficiently adapts the LRFs to maintain this property while simultaneously maximizing each component's local reward. This approach enables independent updates of heterogeneous configurations using the designated optimization method, while ensuring that local improvements consistently lead to performance gains. We present extensive evaluations across five real-world compound systems to demonstrate that Optimas outperforms strong baselines by an average improvement of 11.92%, offering a general and effective approach for improving compound systems. Our website is at https://optimas.stanford.edu.

Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards

TL;DR

Abstract

Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (8)