Table of Contents
Fetching ...

Cognify: Supercharging Gen-AI Workflows With Hierarchical Autotuning

Zijian He, Reyna Abhyankar, Vikranth Srivatsa, Yiying Zhang

TL;DR

This work formalizes the problem of autotuning gen-AI workflows, which consist of multiple model/tool/data interactions, and shows that traditional AutoML approaches do not suffice. It introduces AdaSeek, an adaptive hierarchical Bayesian optimization algorithm that allocates search budget across architecture, step, and weight cogs, enabling efficient exploration under limited budgets. Building on AdaSeek, Cognify provides an extensible framework to optimize for generation quality, latency, and cost, using architecture, step, and weight cogs such as task decomposition, model selection, code rewriting, reasoning, and few-shot prompts. Empirically, Cognify delivers substantial improvements across six diverse workflows, surpassing baselines like DSPy and Trace in quality, cost savings, and latency reductions, while remaining robust to budget and data variations and openly available for broader use.

Abstract

Today's gen-AI workflows that involve multiple ML model calls, tool/API calls, data retrieval, or generic code execution are often tuned manually in an ad-hoc way that is both time-consuming and error-prone. In this paper, we propose a systematic approach for automatically tuning gen-AI workflows. Our key insight is that gen-AI workflows can benefit from structure, operator, and prompt changes, but unique properties of gen-AI workflows require new optimization techniques. We propose AdaSeek, an adaptive hierarchical search algorithm for autotuning gen-AI workflows. AdaSeek organizes workflow tuning methods into different layers based on the user-specified total search budget and distributes the budget across different layers based on the complexity of each layer. During its hierarchical search, AdaSeek redistributes the search budget from less useful to more promising tuning configurations based on workflow-level evaluation results. We implement AdaSeek in a workflow autotuning framework called Cognify and evaluate Cognify using six types of workflows such as RAG-based QA and text-to-SQL transformation. Overall, Cognify improves these workflows' generation quality by up to 2.8x, reduces execution monetary cost by up to 10x, and reduces end-to-end latency by 2.7x.

Cognify: Supercharging Gen-AI Workflows With Hierarchical Autotuning

TL;DR

This work formalizes the problem of autotuning gen-AI workflows, which consist of multiple model/tool/data interactions, and shows that traditional AutoML approaches do not suffice. It introduces AdaSeek, an adaptive hierarchical Bayesian optimization algorithm that allocates search budget across architecture, step, and weight cogs, enabling efficient exploration under limited budgets. Building on AdaSeek, Cognify provides an extensible framework to optimize for generation quality, latency, and cost, using architecture, step, and weight cogs such as task decomposition, model selection, code rewriting, reasoning, and few-shot prompts. Empirically, Cognify delivers substantial improvements across six diverse workflows, surpassing baselines like DSPy and Trace in quality, cost savings, and latency reductions, while remaining robust to budget and data variations and openly available for broader use.

Abstract

Today's gen-AI workflows that involve multiple ML model calls, tool/API calls, data retrieval, or generic code execution are often tuned manually in an ad-hoc way that is both time-consuming and error-prone. In this paper, we propose a systematic approach for automatically tuning gen-AI workflows. Our key insight is that gen-AI workflows can benefit from structure, operator, and prompt changes, but unique properties of gen-AI workflows require new optimization techniques. We propose AdaSeek, an adaptive hierarchical search algorithm for autotuning gen-AI workflows. AdaSeek organizes workflow tuning methods into different layers based on the user-specified total search budget and distributes the budget across different layers based on the complexity of each layer. During its hierarchical search, AdaSeek redistributes the search budget from less useful to more promising tuning configurations based on workflow-level evaluation results. We implement AdaSeek in a workflow autotuning framework called Cognify and evaluate Cognify using six types of workflows such as RAG-based QA and text-to-SQL transformation. Overall, Cognify improves these workflows' generation quality by up to 2.8x, reduces execution monetary cost by up to 10x, and reduces end-to-end latency by 2.7x.

Paper Structure

This paper contains 29 sections, 6 equations, 8 figures, 2 tables, 3 algorithms.

Figures (8)

  • Figure 1: Gen-AI Workflows Tuning Methods.SLM and LLM represent different language models (e.g., small and large). Code, Tool, and Data represent code blocks, tool calls, and data retrieval. Dash curved lines represent loops or control flow changes. p* represents prompt optimizations or additional information added for the downstream step.
  • Figure 2: Generation Quality vs Cost/Latency.Dashed lines show the Pareto frontier (upper left is better). Cost shown as model API dollar cost for every 1000 requests. Cognify selects models from GPT-4o-mini and Llama-8B. DSPy and Trace do not support model selection and are given GPT-4o-mini for all steps. Trace results for Text-2-SQL and FinRobot have 0 quality and are not included.
  • Figure 3: Effectiveness of Layering over Budget.The quality (higher is better), cost (lower is better), and latency (lower is better) achieved on Text-to-SQL by Cognify when using different number of search layers under different budget.
  • Figure 4: Grid Search and Cognify's Searched Configurations.Performed on the HotpotQA workload. Cognify's search results are ordered by iterations from yellow to red colors (up to 128 iterations). Grid search explores the entire 4096 configurations.
  • Figure 5: Sensitivity to Training Set Size.Conducted on FinRobot. X-axis represents number of training data points, Y-axis represents quality.
  • ...and 3 more figures