Table of Contents
Fetching ...

Designing Algorithms Empowered by Language Models: An Analytical Framework, Case Studies, and Insights

Yanxi Chen, Yaliang Li, Bolin Ding, Jingren Zhou

TL;DR

This paper develops a formal, graph-based framework for designing and analyzing LLM-based algorithms, treating them as computational graphs with LLM and non-LLM nodes and explicit task decomposition. It defines per-node error and cost metrics, accounts for LLM characteristics and inference services, and provides general bounds on error propagation and end-to-end costs, including latency under parallelism. Through synthetic case studies (counting, sorting, retrieval) and broader patterns (DAGs, iterative retrieval, recursive decomposition), the work offers actionable insights on how granularity, prompting strategies, and parallelism impact accuracy and efficiency, with empirical validation on multiple LLMs. The framework aims to guide principled design and hyperparameter tuning for robust LLM-based algorithms, while highlighting limitations and directions for extending to more complex, real-world workflows.

Abstract

This work presents an analytical framework for the design and analysis of LLM-based algorithms, i.e., algorithms that contain one or multiple calls of large language models (LLMs) as sub-routines and critically rely on the capabilities of LLMs. While such algorithms, ranging from basic LLM calls with prompt engineering to complicated LLM-powered agentic workflows and compound AI systems, have achieved remarkable empirical success, their design and optimization oftentimes require extensive trial-and-errors and case-by-case analysis. Our proposed framework serves as an attempt to mitigate such headaches, offering a formal and systematic approach for analyzing how the accuracy and efficiency of an LLM-based algorithm will be impacted by critical design choices, such as the pattern and granularity of task decomposition, or the prompt for each LLM call. Through a wide range of case studies covering diverse algorithm patterns (including parallel/hierarchical/recursive task decomposition and generic directed acyclic graphs), we demonstrate the proposed framework in action and derive interesting insights that generalize across scenarios, accompanied by systematic empirical validation in synthetic settings.

Designing Algorithms Empowered by Language Models: An Analytical Framework, Case Studies, and Insights

TL;DR

This paper develops a formal, graph-based framework for designing and analyzing LLM-based algorithms, treating them as computational graphs with LLM and non-LLM nodes and explicit task decomposition. It defines per-node error and cost metrics, accounts for LLM characteristics and inference services, and provides general bounds on error propagation and end-to-end costs, including latency under parallelism. Through synthetic case studies (counting, sorting, retrieval) and broader patterns (DAGs, iterative retrieval, recursive decomposition), the work offers actionable insights on how granularity, prompting strategies, and parallelism impact accuracy and efficiency, with empirical validation on multiple LLMs. The framework aims to guide principled design and hyperparameter tuning for robust LLM-based algorithms, while highlighting limitations and directions for extending to more complex, real-world workflows.

Abstract

This work presents an analytical framework for the design and analysis of LLM-based algorithms, i.e., algorithms that contain one or multiple calls of large language models (LLMs) as sub-routines and critically rely on the capabilities of LLMs. While such algorithms, ranging from basic LLM calls with prompt engineering to complicated LLM-powered agentic workflows and compound AI systems, have achieved remarkable empirical success, their design and optimization oftentimes require extensive trial-and-errors and case-by-case analysis. Our proposed framework serves as an attempt to mitigate such headaches, offering a formal and systematic approach for analyzing how the accuracy and efficiency of an LLM-based algorithm will be impacted by critical design choices, such as the pattern and granularity of task decomposition, or the prompt for each LLM call. Through a wide range of case studies covering diverse algorithm patterns (including parallel/hierarchical/recursive task decomposition and generic directed acyclic graphs), we demonstrate the proposed framework in action and derive interesting insights that generalize across scenarios, accompanied by systematic empirical validation in synthetic settings.
Paper Structure (89 sections, 2 theorems, 29 equations, 21 figures, 3 tables, 1 algorithm)

This paper contains 89 sections, 2 theorems, 29 equations, 21 figures, 3 tables, 1 algorithm.

Key Result

Proposition 1

Assume that for each $i \in [k]$, the solution $\bm{y}_i$ returned by one LLM call for the $i$-th sub-task is monotone, matches the length of the corresponding input $\bm{x}_i$, and has an $\ell_{\infty}$ error $\mathcal{E}_i$. Then the $\ell_{\infty}$ error of the final solution $\bm{y}$ is upper b

Figures (21)

  • Figure 1: An overview of the proposed analytical framework.
  • Figure 2: An LLM node (left) or non-LLM node (right) in the computational graphs of LLM-based algorithms. Each node can have one or multiple inputs/outputs. We use the abbreviation "NL" for "natural language", and "DS" for "data structure".
  • Figure 3: Examples of computational graph representations for LLM-based algorithms. (a) Parallel decomposition. Detailed analysis and concrete examples for this graph pattern will be elaborated in Section \ref{['sec:case_studies_and_insights_parallel_decomposition']}. (b) Book-length summarization, cf. Figure 1 in chang2024booookscore. The "dividing" node contains a symbolic program that divides the input text into multiple smaller chunks. (c) The ReAct algorithm yao2023react. Each "acting" node represents one API call for a certain tool, and each "aggregation" node aggregates the outputs of its predecessor nodes, e.g., by simple concatenation.
  • Figure 4: Empirical results for concrete examples of parallel decomposition. Each curve represents the mean and standard deviation of 10 independent trials. See Appendix \ref{['sec:supp_parallel_decomposition']} for further details and complete results for the case studies on parallel decomposition.
  • Figure 5: Two algorithms for iterative retrieval and reasoning. The number of iterations is assumed to be 2 here; in reality, this value is determined adaptively by the algorithm itself at runtime. For clarity, some LLM or non-LLM nodes (cf. Figure \ref{['fig:graph_nodes']}) are merged into one, and an arrow from the "chunking" node to a shaded block means that each "retrieval" node within the block takes the corresponding chunk as input.
  • ...and 16 more figures

Theorems & Definitions (6)

  • Remark 1
  • Remark 2
  • Proposition 1
  • Remark 3
  • Proposition 2
  • Remark 4: Mitigating hallucination in retrieval