Table of Contents
Fetching ...

Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards

Shirley Wu, Parth Sarthi, Shiyu Zhao, Aaron Lee, Herumb Shandilya, Adrian Mladenic Grobelnik, Nurendra Choudhary, Eddie Huang, Karthik Subbian, Linjun Zhang, Diyi Yang, James Zou, Jure Leskovec

TL;DR

Optimas tackles the optimization of compound AI systems by introducing globally aligned Local Reward Functions (LRFs) that supervise per-component updates. Built on a shared LLM backbone with component-specific heads, LRFs are adapted online to preserve alignment with the global objective, enabling independent optimization across heterogeneous configurations. The framework provides theoretical convergence guarantees and demonstrates robust, data-efficient gains across five real-world tasks, outperforming baselines by about 11.92% on average. This approach offers scalable, interpretable, and resource-efficient optimization for complex AI pipelines with broad practical impact.

Abstract

Compound AI systems integrating multiple components, such as Large Language Models, specialized tools, and traditional machine learning models, are increasingly deployed to solve complex real-world tasks. However, optimizing compound systems remains challenging due to their non-differentiable structures and diverse configuration types across components, including prompts, hyperparameters, and model parameters. To address this challenge, we propose Optimas, a unified framework for effective optimization of compound systems. The core idea of Optimas is to maintain one Local Reward Function (LRF) per component, each satisfying a local-global alignment property, i.e., each component's local reward correlates with the global system performance. In each iteration, Optimas efficiently adapts the LRFs to maintain this property while simultaneously maximizing each component's local reward. This approach enables independent updates of heterogeneous configurations using the designated optimization method, while ensuring that local improvements consistently lead to performance gains. We present extensive evaluations across five real-world compound systems to demonstrate that Optimas outperforms strong baselines by an average improvement of 11.92%, offering a general and effective approach for improving compound systems. Our website is at https://optimas.stanford.edu.

Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards

TL;DR

Optimas tackles the optimization of compound AI systems by introducing globally aligned Local Reward Functions (LRFs) that supervise per-component updates. Built on a shared LLM backbone with component-specific heads, LRFs are adapted online to preserve alignment with the global objective, enabling independent optimization across heterogeneous configurations. The framework provides theoretical convergence guarantees and demonstrates robust, data-efficient gains across five real-world tasks, outperforming baselines by about 11.92% on average. This approach offers scalable, interpretable, and resource-efficient optimization for complex AI pipelines with broad practical impact.

Abstract

Compound AI systems integrating multiple components, such as Large Language Models, specialized tools, and traditional machine learning models, are increasingly deployed to solve complex real-world tasks. However, optimizing compound systems remains challenging due to their non-differentiable structures and diverse configuration types across components, including prompts, hyperparameters, and model parameters. To address this challenge, we propose Optimas, a unified framework for effective optimization of compound systems. The core idea of Optimas is to maintain one Local Reward Function (LRF) per component, each satisfying a local-global alignment property, i.e., each component's local reward correlates with the global system performance. In each iteration, Optimas efficiently adapts the LRFs to maintain this property while simultaneously maximizing each component's local reward. This approach enables independent updates of heterogeneous configurations using the designated optimization method, while ensuring that local improvements consistently lead to performance gains. We present extensive evaluations across five real-world compound systems to demonstrate that Optimas outperforms strong baselines by an average improvement of 11.92%, offering a general and effective approach for improving compound systems. Our website is at https://optimas.stanford.edu.

Paper Structure

This paper contains 38 sections, 5 theorems, 15 equations, 7 figures, 11 tables, 1 algorithm.

Key Result

Theorem 4.1

Under regulartiy conditions, the maximizer of equation eq:reward_model satisfies the local-global alignment property (equation eq:alignment). In addition, maximizing $r_k\bigl(x_k,\,C_k(x_k;\mathbf{v}_k)\bigr)$ over $\mathbf{v}_k$ and maximizing $R(x,f(x;\mathbf{v}_{-k})\mid C_k(x_k;\mathbf{v}_k))$

Figures (7)

  • Figure 1: Overview. Given a compound AI system's heterogeneous configurations (e.g., prompts, parameters) across multiple components, Optimas maintains globally aligned Local Reward Functions (LRFs) as the system evolves, where each supervises a component and assigns higher local rewards to outputs with higher system performance (aka. global rewards). It iteratively adapts LRFs and optimizes each component to maximize its local reward for effective system optimization.
  • Figure 2: Five real-world and challenging compound AI systems. The goal is to automatically optimize the configuration across a heterogeneous set of components and parameters, e.g., model parameters, prompts, model selection choice, and hyperparameters. See Appendix \ref{['app:system']} for details.
  • Figure 3: Optimas optimization iteration. At each iteration, Optimas updates a component $C_k$ by first collecting a mini-batch of preference data and adapting its Local Reward Function $r_k$ to remain aligned with the global task metric. This alignment helps ensure that optimizing the component to maximize its local reward also improves the global reward.
  • Figure 4: Global reward and configuration updates of the three compound AI systems over the optimization iterations. For conciseness, we only show the local optimization steps that lead to an increase in global reward on the validation sets. The annotations show the optimized components.
  • Figure 5: Local reward models with varying alignment quality are used to optimize a selected component in each task, where we observe that higher alignment quality yields higher global rewards.
  • ...and 2 more figures

Theorems & Definitions (8)

  • Theorem 4.1: Informal
  • Theorem 4.2
  • Theorem B.1
  • proof
  • Lemma B.1
  • proof
  • Lemma B.2
  • proof