Table of Contents
Fetching ...

LLM-Driven Collaborative Model for Untangling Commits via Explicit and Implicit Dependency Reasoning

Bo Hou, Xin Tan, Kai Zheng, Fang Liu, Yinghao Zhu, Li Zhang

TL;DR

This paper tackles tangled commits by proposing ColaUntangle, an LLM-driven collaborative framework that reasons over explicit and implicit dependencies using Contexts derived from a multi-version program dependency graph. It deploys three agents—a explicit-dependency worker, an implicit-dependency worker, and a reviewer—whose iterative consultation yields explainable untangling decisions. Evaluations on 1,612 tangled C# commits and 14k tangled Java commits show substantial gains in Accuracy$^c$ (44% on C# and 82% on Java) over strong baselines, with collaborative consultation proving critical to performance. The work advances explainable, human-aligned automated commit untangling and provides implementations and data to support reproducibility and further research.

Abstract

Atomic commits, which address a single development concern, are a best practice in software development. In practice, however, developers often produce tangled commits that mix unrelated changes, complicating code review and maintenance. Prior untangling approaches (rule-based, feature-based, or graph-based) have made progress but typically rely on shallow signals and struggle to distinguish explicit dependencies (e.g., control/data flow) from implicit ones (e.g., semantic or conceptual relationships). In this paper, we propose ColaUntangle, a new collaborative consultation framework for commit untangling that models both explicit and implicit dependencies among code changes. ColaUntangle integrates Large Language Model (LLM)-driven agents in a multi-agent architecture: one agent specializes in explicit dependencies, another in implicit ones, and a reviewer agent synthesizes their perspectives through iterative consultation. To capture structural and contextual information, we construct Explicit and Implicit Contexts, enabling agents to reason over code relationships with both symbolic and semantic depth. We evaluate ColaUntangle on two widely-used datasets (1,612 C# and 14k Java tangled commits). Experimental results show that ColaUntangle outperforms the best-performing baseline, achieving an improvement of 44% on the C# dataset and 82% on the Java dataset. These findings highlight the potential of LLM-based collaborative frameworks for advancing automated commit untangling tasks.

LLM-Driven Collaborative Model for Untangling Commits via Explicit and Implicit Dependency Reasoning

TL;DR

This paper tackles tangled commits by proposing ColaUntangle, an LLM-driven collaborative framework that reasons over explicit and implicit dependencies using Contexts derived from a multi-version program dependency graph. It deploys three agents—a explicit-dependency worker, an implicit-dependency worker, and a reviewer—whose iterative consultation yields explainable untangling decisions. Evaluations on 1,612 tangled C# commits and 14k tangled Java commits show substantial gains in Accuracy (44% on C# and 82% on Java) over strong baselines, with collaborative consultation proving critical to performance. The work advances explainable, human-aligned automated commit untangling and provides implementations and data to support reproducibility and further research.

Abstract

Atomic commits, which address a single development concern, are a best practice in software development. In practice, however, developers often produce tangled commits that mix unrelated changes, complicating code review and maintenance. Prior untangling approaches (rule-based, feature-based, or graph-based) have made progress but typically rely on shallow signals and struggle to distinguish explicit dependencies (e.g., control/data flow) from implicit ones (e.g., semantic or conceptual relationships). In this paper, we propose ColaUntangle, a new collaborative consultation framework for commit untangling that models both explicit and implicit dependencies among code changes. ColaUntangle integrates Large Language Model (LLM)-driven agents in a multi-agent architecture: one agent specializes in explicit dependencies, another in implicit ones, and a reviewer agent synthesizes their perspectives through iterative consultation. To capture structural and contextual information, we construct Explicit and Implicit Contexts, enabling agents to reason over code relationships with both symbolic and semantic depth. We evaluate ColaUntangle on two widely-used datasets (1,612 C# and 14k Java tangled commits). Experimental results show that ColaUntangle outperforms the best-performing baseline, achieving an improvement of 44% on the C# dataset and 82% on the Java dataset. These findings highlight the potential of LLM-based collaborative frameworks for advancing automated commit untangling tasks.

Paper Structure

This paper contains 27 sections, 9 equations, 9 figures, 5 tables, 2 algorithms.

Figures (9)

  • Figure 1: Motivating Examples
  • Figure 2: Overall Workflow of ColaUntangle
  • Figure 3: Code Change Example with Explicit and Implicit Contexts
  • Figure 4: Agents in Multi-Agent Collaboration
  • Figure 5: Average $Accuracy^c$ and Number of Changed Statements in Tangled Commits of Each Repository
  • ...and 4 more figures