LLM-AutoDiff: Auto-Differentiate Any LLM Workflow
Li Yin, Zhangyang Wang
TL;DR
This paper addresses the bottleneck of manual prompt engineering in complex LLM pipelines by introducing LLM-AutoDiff, a graph-based framework that generalizes textual gradient methods to multi-node, potentially cyclic workflows. It formalizes Automatic LLM Application Optimization (ALAO) and Automatic Prompt Engineering (APE), and implements them in AdalFlow, enabling a frozen backward LLM to produce textual gradients that propagate through both LLM and functional nodes via a dynamic parameter graph $\mathcal{G}_p$. Key innovations include pass-through and time-sequential gradients, peer-aware sub-prompts, selective gradient computation, and a Gradient-Driven Prompt Optimizer (GDPO) that leverages historical feedback for coherent updates. Extensive experiments on single-node and multi-node pipelines (including multi-hop RAG and ReAct-style agents) show improved final accuracy and reduced training cost relative to Text-Grad and DsPy baselines, demonstrating the practical potential of end-to-end automatic optimization of LLM applications. The framework paves the way for scalable, automated refinement of large LLM ecosystems, with future directions toward hyperparameter co-optimization, dynamic graph reconfiguration, and multimodal integration.
Abstract
Large Language Models (LLMs) have reshaped natural language processing, powering applications from multi-hop retrieval and question answering to autonomous agent workflows. Yet, prompt engineering -- the task of crafting textual inputs to effectively direct LLMs -- remains difficult and labor-intensive, particularly for complex pipelines that combine multiple LLM calls with functional operations like retrieval and data formatting. We introduce LLM-AutoDiff: a novel framework for Automatic Prompt Engineering (APE) that extends textual gradient-based methods (such as Text-Grad) to multi-component, potentially cyclic LLM architectures. Implemented within the AdalFlow library, LLM-AutoDiff treats each textual input as a trainable parameter and uses a frozen backward engine LLM to generate feedback-akin to textual gradients -- that guide iterative prompt updates. Unlike prior single-node approaches, LLM-AutoDiff inherently accommodates functional nodes, preserves time-sequential behavior in repeated calls (e.g., multi-hop loops), and combats the "lost-in-the-middle" problem by isolating distinct sub-prompts (instructions, formats, or few-shot examples). It further boosts training efficiency by focusing on error-prone samples through selective gradient computation. Across diverse tasks, including single-step classification, multi-hop retrieval-based QA, and agent-driven pipelines, LLM-AutoDiff consistently outperforms existing textual gradient baselines in both accuracy and training cost. By unifying prompt optimization through a graph-centric lens, LLM-AutoDiff offers a powerful new paradigm for scaling and automating LLM workflows - mirroring the transformative role that automatic differentiation libraries have long played in neural network research.
