Table of Contents
Fetching ...

The Capability of Code Review as a Communication Network

Michael Dorner, Daniel Mendez

TL;DR

This paper formalizes the theory that code review functions as a communication network and tests it with an in-silico diffusion experiment across three open-source and three closed-source code review systems. It models code review as time-varying hypergraphs and computes time-respecting journeys to quantify how widely and how quickly information can diffuse, measuring horizon and two distance notions (topological and temporal). The results show that diffusion can be wide and fast under best-case conditions, with closed-source systems achieving greater reach and open-source systems often enabling faster diffusion within smaller cores; differences reflect underlying organizational structures. The study provides a robust replication-based validation, discusses implications for measuring collaboration and generalizing findings, and explores how AI could reshape the communicative role of code review, while acknowledging threats to validity and the need for careful interpretation across contexts.

Abstract

Background: Code review, a core practice in software engineering, has been widely studied as a collaborative process, with prior work suggesting it functions as a communication network. However, this theory remains untested, limiting its practical and theoretical significance. Objective: This study aims to (1) formalize the theory of code review as a communication network explicit and (2) empirically test its validity by quantifying how widely and how quickly information can spread in code review. Method: We replicate an in-silico experiment simulating information diffusion -- the spread of information among participants -- under best-case conditions across three open-source (Android, Visual Studio Code, React) and three closed-source code review systems (Microsoft, Spotify, Trivago) each modeled as communication network. By measuring the number of reachable participants and the minimal topological and temporal distances, we quantify how widely and how quickly information can spread through code review. Results: We demonstrate that code review can enable both wide and fast information diffusion, even at a large scale. However, this capacity varies: open-source code review spreads information faster, while closed-source review reaches more participants. Conclusion: Our findings reinforce and refine the theory, highlighting implications for measuring collaboration, generalizing open-source studies, and the role of AI in shaping future code review.

The Capability of Code Review as a Communication Network

TL;DR

This paper formalizes the theory that code review functions as a communication network and tests it with an in-silico diffusion experiment across three open-source and three closed-source code review systems. It models code review as time-varying hypergraphs and computes time-respecting journeys to quantify how widely and how quickly information can diffuse, measuring horizon and two distance notions (topological and temporal). The results show that diffusion can be wide and fast under best-case conditions, with closed-source systems achieving greater reach and open-source systems often enabling faster diffusion within smaller cores; differences reflect underlying organizational structures. The study provides a robust replication-based validation, discusses implications for measuring collaboration and generalizing findings, and explores how AI could reshape the communicative role of code review, while acknowledging threats to validity and the need for careful interpretation across contexts.

Abstract

Background: Code review, a core practice in software engineering, has been widely studied as a collaborative process, with prior work suggesting it functions as a communication network. However, this theory remains untested, limiting its practical and theoretical significance. Objective: This study aims to (1) formalize the theory of code review as a communication network explicit and (2) empirically test its validity by quantifying how widely and how quickly information can spread in code review. Method: We replicate an in-silico experiment simulating information diffusion -- the spread of information among participants -- under best-case conditions across three open-source (Android, Visual Studio Code, React) and three closed-source code review systems (Microsoft, Spotify, Trivago) each modeled as communication network. By measuring the number of reachable participants and the minimal topological and temporal distances, we quantify how widely and how quickly information can spread through code review. Results: We demonstrate that code review can enable both wide and fast information diffusion, even at a large scale. However, this capacity varies: open-source code review spreads information faster, while closed-source review reaches more participants. Conclusion: Our findings reinforce and refine the theory, highlighting implications for measuring collaboration, generalizing open-source studies, and the role of AI in shaping future code review.

Paper Structure

This paper contains 39 sections, 5 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: The empirical research cycle (in analogy to Mendez2019): While exploratory research is theory-generating using inductive reasoning (starting with observations), confirmatory research is theory-testing using deductive reasoning (starting with a theory). This research is confirmatory.
  • Figure 2: The delineation and interplay with our prior work in this line of research.
  • Figure 3: An example hypergraph (a) and its bipartite-graph equivalent (b).
  • Figure 4: Empirical cumulative distribution of the normalized information diffusion ranges per code review system after four weeks.
  • Figure 5: Empirical cumulative distribution of the topological distances between participants per code review system after four weeks. The topological distance is the minimal number of code reviews (hops) required to spread information from one code review participant to another.
  • ...and 5 more figures