Table of Contents
Fetching ...

Abacus: A Cost-Based Optimizer for Semantic Operator Systems

Matthew Russo, Sivaprasad Sudhir, Gerardo Vitagliano, Chunwei Liu, Tim Kraska, Samuel Madden, Michael Cafarella

TL;DR

<3-5 sentence high-level summary> Abacus introduces a cost-based optimizer for semantic operator systems that enables constrained optimization across quality, cost, and latency. It combines a Pareto-Cascades dynamic programming approach with a multi-armed bandit sampling strategy to efficiently explore thousands of operator implementations and maintain Pareto frontiers for subplans. Empirical results on BioDEX, CUAD, and MMQA show substantial improvements in output quality and dramatic reductions in cost and latency compared with prior work, with priors further accelerating optimization and constraint satisfaction. The framework is designed to be extensible, enabling new operators and rules without changing the host program, and demonstrates practical impact for scalable AI-driven document processing pipelines.

Abstract

LLMs enable an exciting new class of data processing applications over large collections of unstructured documents. Several new programming frameworks have enabled developers to build these applications by composing them out of semantic operators: a declarative set of AI-powered data transformations with natural language specifications. These include LLM-powered maps, filters, joins, etc. used for document processing tasks such as information extraction, summarization, and more. While systems of semantic operators have achieved strong performance on benchmarks, they can be difficult to optimize. An optimizer for this setting must determine how to physically implement each semantic operator in a way that optimizes the system globally. Existing optimizers are limited in the number of optimizations they can apply, and most (if not all) cannot optimize system quality, cost, or latency subject to constraint(s) on the other dimensions. In this paper we present Abacus, an extensible, cost-based optimizer which searches for the best implementation of a semantic operator system given a (possibly constrained) optimization objective. Abacus estimates operator performance by leveraging a minimal set of validation examples and, if available, prior beliefs about operator performance. We evaluate Abacus on document processing workloads in the biomedical and legal domains (BioDEX; CUAD) and multi-modal question answering (MMQA). We demonstrate that systems optimized by Abacus achieve 18.7%-39.2% better quality and up to 23.6x lower cost and 4.2x lower latency than the next best system.

Abacus: A Cost-Based Optimizer for Semantic Operator Systems

TL;DR

<3-5 sentence high-level summary> Abacus introduces a cost-based optimizer for semantic operator systems that enables constrained optimization across quality, cost, and latency. It combines a Pareto-Cascades dynamic programming approach with a multi-armed bandit sampling strategy to efficiently explore thousands of operator implementations and maintain Pareto frontiers for subplans. Empirical results on BioDEX, CUAD, and MMQA show substantial improvements in output quality and dramatic reductions in cost and latency compared with prior work, with priors further accelerating optimization and constraint satisfaction. The framework is designed to be extensible, enabling new operators and rules without changing the host program, and demonstrates practical impact for scalable AI-driven document processing pipelines.

Abstract

LLMs enable an exciting new class of data processing applications over large collections of unstructured documents. Several new programming frameworks have enabled developers to build these applications by composing them out of semantic operators: a declarative set of AI-powered data transformations with natural language specifications. These include LLM-powered maps, filters, joins, etc. used for document processing tasks such as information extraction, summarization, and more. While systems of semantic operators have achieved strong performance on benchmarks, they can be difficult to optimize. An optimizer for this setting must determine how to physically implement each semantic operator in a way that optimizes the system globally. Existing optimizers are limited in the number of optimizations they can apply, and most (if not all) cannot optimize system quality, cost, or latency subject to constraint(s) on the other dimensions. In this paper we present Abacus, an extensible, cost-based optimizer which searches for the best implementation of a semantic operator system given a (possibly constrained) optimization objective. Abacus estimates operator performance by leveraging a minimal set of validation examples and, if available, prior beliefs about operator performance. We evaluate Abacus on document processing workloads in the biomedical and legal domains (BioDEX; CUAD) and multi-modal question answering (MMQA). We demonstrate that systems optimized by Abacus achieve 18.7%-39.2% better quality and up to 23.6x lower cost and 4.2x lower latency than the next best system.

Paper Structure

This paper contains 18 sections, 1 theorem, 2 equations, 6 figures, 2 tables, 5 algorithms.

Key Result

Theorem 3.1

(Under the operator independence assumptions of our cost model in sec:optimization-challenges) every subplan of a Pareto-optimal physical plan is itself Pareto-optimal.

Figures (6)

  • Figure 1: An illustration of Abacus compiling a program for a literature search workload (Top) into two different physical plans for two different optimization objectives. (Left) the user implements the workload in a Palimpzest program and provides the input data they wish to process and (optionally) validation data which Abacus may use to guide its optimization. The program is compiled to a logical plan, which Abacus seeks to implement with an optimal physical plan. (Center) given the unconstrained objective of maximizing quality, Abacus is able to produce a physical plan which achieves high quality for this task. (Right) given the objective of maximizing quality subject to a constraint of $1 in execution cost, Abacus produces a plan which satisfies the constraint and is much cheaper than the unconstrained plan, while only trading-off a modest decrease in quality.
  • Figure 2: Overview of Abacus. The developer provides an AI program, optimization objective, input data, and (optionally) validation data. The program is compiled into an initial logical plan. Abacus applies rules to enumerate a search space of physical plans. Abacus iteratively samples physical operators and processes validation inputs with them to build up a cost model of the operator performance. Abacus returns the Pareto-optimal plan based on its estimates and the user objective.
  • Figure 3: A toy example of the Cascades algorithm applied to a simple logical plan. Cascades first constructs an initial group tree with one logical expression per group. The algorithm then applies a task to optimize the final group (SMF), which initiates a dynamic programming routine that searches the space of plans through repeated application of tasks. After all possible tasks have been applied (or a limit on the total number of tasks has been reached), Cascades recursively constructs the optimal physical plan by selecting the optimal physical expression at each group.
  • Figure 4: System output quality as a function of the sample budget when optimizing with (1) no priors, (2) naive priors computed from MMLU-Pro performance, and (3) priors computed with samples from each benchmark's train split. We optimize CUAD and BioDEX with unconstrained and constrained objectives. For constrained optimization, we set the cost constraint to be the 25th percentile of plan costs observed in the unconstrained setting. Overall, Abacus yields better plans in the constrained and unconstrained settings when leveraging prior beliefs on operator performance.
  • Figure 5: The fraction of plans which satisfy the optimization constraint when maximizing quality with an upper bound on cost on BioDEX. We ran Abacus with its Pareto-Cascades algorithm and a Greedy algorithm with three sample budgets on 10 unique slices of the BioDEX dataset per sample budget. Pareto-Cascades identifies more plans which satisfy the constraint than the Greedy baseline for each sample budget and prior beliefs scenario (with one exception where neither algorithm identifies such a plan).
  • ...and 1 more figures

Theorems & Definitions (1)

  • Theorem 3.1