Designing Algorithms Empowered by Language Models: An Analytical Framework, Case Studies, and Insights
Yanxi Chen, Yaliang Li, Bolin Ding, Jingren Zhou
TL;DR
This paper develops a formal, graph-based framework for designing and analyzing LLM-based algorithms, treating them as computational graphs with LLM and non-LLM nodes and explicit task decomposition. It defines per-node error and cost metrics, accounts for LLM characteristics and inference services, and provides general bounds on error propagation and end-to-end costs, including latency under parallelism. Through synthetic case studies (counting, sorting, retrieval) and broader patterns (DAGs, iterative retrieval, recursive decomposition), the work offers actionable insights on how granularity, prompting strategies, and parallelism impact accuracy and efficiency, with empirical validation on multiple LLMs. The framework aims to guide principled design and hyperparameter tuning for robust LLM-based algorithms, while highlighting limitations and directions for extending to more complex, real-world workflows.
Abstract
This work presents an analytical framework for the design and analysis of LLM-based algorithms, i.e., algorithms that contain one or multiple calls of large language models (LLMs) as sub-routines and critically rely on the capabilities of LLMs. While such algorithms, ranging from basic LLM calls with prompt engineering to complicated LLM-powered agentic workflows and compound AI systems, have achieved remarkable empirical success, their design and optimization oftentimes require extensive trial-and-errors and case-by-case analysis. Our proposed framework serves as an attempt to mitigate such headaches, offering a formal and systematic approach for analyzing how the accuracy and efficiency of an LLM-based algorithm will be impacted by critical design choices, such as the pattern and granularity of task decomposition, or the prompt for each LLM call. Through a wide range of case studies covering diverse algorithm patterns (including parallel/hierarchical/recursive task decomposition and generic directed acyclic graphs), we demonstrate the proposed framework in action and derive interesting insights that generalize across scenarios, accompanied by systematic empirical validation in synthetic settings.
