Cognify: Supercharging Gen-AI Workflows With Hierarchical Autotuning

Zijian He; Reyna Abhyankar; Vikranth Srivatsa; Yiying Zhang

Cognify: Supercharging Gen-AI Workflows With Hierarchical Autotuning

Zijian He, Reyna Abhyankar, Vikranth Srivatsa, Yiying Zhang

TL;DR

This work formalizes the problem of autotuning gen-AI workflows, which consist of multiple model/tool/data interactions, and shows that traditional AutoML approaches do not suffice. It introduces AdaSeek, an adaptive hierarchical Bayesian optimization algorithm that allocates search budget across architecture, step, and weight cogs, enabling efficient exploration under limited budgets. Building on AdaSeek, Cognify provides an extensible framework to optimize for generation quality, latency, and cost, using architecture, step, and weight cogs such as task decomposition, model selection, code rewriting, reasoning, and few-shot prompts. Empirically, Cognify delivers substantial improvements across six diverse workflows, surpassing baselines like DSPy and Trace in quality, cost savings, and latency reductions, while remaining robust to budget and data variations and openly available for broader use.

Abstract

Today's gen-AI workflows that involve multiple ML model calls, tool/API calls, data retrieval, or generic code execution are often tuned manually in an ad-hoc way that is both time-consuming and error-prone. In this paper, we propose a systematic approach for automatically tuning gen-AI workflows. Our key insight is that gen-AI workflows can benefit from structure, operator, and prompt changes, but unique properties of gen-AI workflows require new optimization techniques. We propose AdaSeek, an adaptive hierarchical search algorithm for autotuning gen-AI workflows. AdaSeek organizes workflow tuning methods into different layers based on the user-specified total search budget and distributes the budget across different layers based on the complexity of each layer. During its hierarchical search, AdaSeek redistributes the search budget from less useful to more promising tuning configurations based on workflow-level evaluation results. We implement AdaSeek in a workflow autotuning framework called Cognify and evaluate Cognify using six types of workflows such as RAG-based QA and text-to-SQL transformation. Overall, Cognify improves these workflows' generation quality by up to 2.8x, reduces execution monetary cost by up to 10x, and reduces end-to-end latency by 2.7x.

Cognify: Supercharging Gen-AI Workflows With Hierarchical Autotuning

TL;DR

Abstract

Cognify: Supercharging Gen-AI Workflows With Hierarchical Autotuning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)