AgentGit: A Version Control Framework for Reliable and Scalable LLM-Powered Multi-Agent Systems

Yang Li; Siqi Ping; Xiyu Chen; Xiaojian Qi; Zigan Wang; Ye Luo; Xiaowei Zhang

AgentGit: A Version Control Framework for Reliable and Scalable LLM-Powered Multi-Agent Systems

Yang Li, Siqi Ping, Xiyu Chen, Xiaojian Qi, Zigan Wang, Ye Luo, Xiaowei Zhang

TL;DR

AgentGit introduces a Git-like rollback and branching layer on top of LangGraph to address reliability and scalability gaps in LLM-powered multi-agent systems. Through a complexity analysis, it shows that rollback dramatically reduces redundant steps, with expressions such as $L = \prod_{i=1}^{n} x_i$, $S_{std} = n \prod_{i=1}^{n} x_i$, and $S_{rollback} = \sum_{i=1}^{n} ( \prod_{j=1}^{i-1} x_j \cdot x_i )$, yielding an efficiency $\eta = S_{std}/S_{rollback}$ that grows with task depth; in the constant-branch case $x_i=\alpha$, $\eta = {n \alpha^n}/{\sum_{i=1}^{n} \alpha^i}$ and $\lim_{n\to\infty} \eta = \infty$. An empirical A/B evaluation against LangGraph, AutoGen, and Agno on arXiv abstract retrieval demonstrates reduced runtime and token usage and enables parallel exploration across branches, while maintaining output quality as measured by G-Eval. The results showcase AgentGit as a practical means to enable error recovery, safe exploration, iterative debugging, and scalable testing in collaborative AI systems, with open-source resources released for further development.

Abstract

With the rapid progress of large language models (LLMs), LLM-powered multi-agent systems (MAS) are drawing increasing interest across academia and industry. However, many current MAS frameworks struggle with reliability and scalability, especially on complex tasks. We present AgentGit, a framework that brings Git-like rollback and branching to MAS workflows. Built as an infrastructure layer on top of LangGraph, AgentGit supports state commit, revert, and branching, allowing agents to traverse, compare, and explore multiple trajectories efficiently. To evaluate AgentGit, we designed an experiment that optimizes target agents by selecting better prompts. We ran a multi-step A/B test against three baselines -- LangGraph, AutoGen, and Agno -- on a real-world task: retrieving and analyzing paper abstracts. Results show that AgentGit significantly reduces redundant computation, lowers runtime and token usage, and supports parallel exploration across multiple branches, enhancing both reliability and scalability in MAS development. This work offers a practical path to more robust MAS design and enables error recovery, safe exploration, iterative debugging, and A/B testing in collaborative AI systems.

AgentGit: A Version Control Framework for Reliable and Scalable LLM-Powered Multi-Agent Systems

TL;DR

Abstract

AgentGit: A Version Control Framework for Reliable and Scalable LLM-Powered Multi-Agent Systems

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (5)