Table of Contents
Fetching ...

AgentGit: A Version Control Framework for Reliable and Scalable LLM-Powered Multi-Agent Systems

Yang Li, Siqi Ping, Xiyu Chen, Xiaojian Qi, Zigan Wang, Ye Luo, Xiaowei Zhang

TL;DR

AgentGit introduces a Git-like rollback and branching layer on top of LangGraph to address reliability and scalability gaps in LLM-powered multi-agent systems. Through a complexity analysis, it shows that rollback dramatically reduces redundant steps, with expressions such as $L = \prod_{i=1}^{n} x_i$, $S_{std} = n \prod_{i=1}^{n} x_i$, and $S_{rollback} = \sum_{i=1}^{n} ( \prod_{j=1}^{i-1} x_j \cdot x_i )$, yielding an efficiency $\eta = S_{std}/S_{rollback}$ that grows with task depth; in the constant-branch case $x_i=\alpha$, $\eta = {n \alpha^n}/{\sum_{i=1}^{n} \alpha^i}$ and $\lim_{n\to\infty} \eta = \infty$. An empirical A/B evaluation against LangGraph, AutoGen, and Agno on arXiv abstract retrieval demonstrates reduced runtime and token usage and enables parallel exploration across branches, while maintaining output quality as measured by G-Eval. The results showcase AgentGit as a practical means to enable error recovery, safe exploration, iterative debugging, and scalable testing in collaborative AI systems, with open-source resources released for further development.

Abstract

With the rapid progress of large language models (LLMs), LLM-powered multi-agent systems (MAS) are drawing increasing interest across academia and industry. However, many current MAS frameworks struggle with reliability and scalability, especially on complex tasks. We present AgentGit, a framework that brings Git-like rollback and branching to MAS workflows. Built as an infrastructure layer on top of LangGraph, AgentGit supports state commit, revert, and branching, allowing agents to traverse, compare, and explore multiple trajectories efficiently. To evaluate AgentGit, we designed an experiment that optimizes target agents by selecting better prompts. We ran a multi-step A/B test against three baselines -- LangGraph, AutoGen, and Agno -- on a real-world task: retrieving and analyzing paper abstracts. Results show that AgentGit significantly reduces redundant computation, lowers runtime and token usage, and supports parallel exploration across multiple branches, enhancing both reliability and scalability in MAS development. This work offers a practical path to more robust MAS design and enables error recovery, safe exploration, iterative debugging, and A/B testing in collaborative AI systems.

AgentGit: A Version Control Framework for Reliable and Scalable LLM-Powered Multi-Agent Systems

TL;DR

AgentGit introduces a Git-like rollback and branching layer on top of LangGraph to address reliability and scalability gaps in LLM-powered multi-agent systems. Through a complexity analysis, it shows that rollback dramatically reduces redundant steps, with expressions such as , , and , yielding an efficiency that grows with task depth; in the constant-branch case , and . An empirical A/B evaluation against LangGraph, AutoGen, and Agno on arXiv abstract retrieval demonstrates reduced runtime and token usage and enables parallel exploration across branches, while maintaining output quality as measured by G-Eval. The results showcase AgentGit as a practical means to enable error recovery, safe exploration, iterative debugging, and scalable testing in collaborative AI systems, with open-source resources released for further development.

Abstract

With the rapid progress of large language models (LLMs), LLM-powered multi-agent systems (MAS) are drawing increasing interest across academia and industry. However, many current MAS frameworks struggle with reliability and scalability, especially on complex tasks. We present AgentGit, a framework that brings Git-like rollback and branching to MAS workflows. Built as an infrastructure layer on top of LangGraph, AgentGit supports state commit, revert, and branching, allowing agents to traverse, compare, and explore multiple trajectories efficiently. To evaluate AgentGit, we designed an experiment that optimizes target agents by selecting better prompts. We ran a multi-step A/B test against three baselines -- LangGraph, AutoGen, and Agno -- on a real-world task: retrieving and analyzing paper abstracts. Results show that AgentGit significantly reduces redundant computation, lowers runtime and token usage, and supports parallel exploration across multiple branches, enhancing both reliability and scalability in MAS development. This work offers a practical path to more robust MAS design and enables error recovery, safe exploration, iterative debugging, and A/B testing in collaborative AI systems.

Paper Structure

This paper contains 16 sections, 5 theorems, 11 equations, 8 figures.

Key Result

Lemma 1

In an MAS, a workflow consists of $n$ steps, where each step allows the selection of different tools or prompt options. Suppose the $i$-th step has $x_i$ available tools or prompt options. Then, the total number of possible outcomes $L$ after executing the workflow can be expressed as: where $x_i$ represents the number of tools or prompt options available at the $i$-th step.

Figures (8)

  • Figure 1: Comparison of task execution workflows: standard model vs. AgentGit with rollback functionality
  • Figure 2: Tree diagram illustrating the branching structure of the task execution process
  • Figure 3: Visualization of the total steps required and efficiency trends for the standard model and rollback-enabled model under varying $x_i$ and $n$
  • Figure 4: Workflow of the MAS task scenario for retrieving abstracts of papers related to a specific topic
  • Figure 5: Tree structure representing the experimental workflow
  • ...and 3 more figures

Theorems & Definitions (5)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Proposition 1
  • Proposition 2