Measuring Information Diffusion in Code Review at Spotify
Michael Dorner, Daniel Mendez, Ehsan Zabardast, Nicole Valdez, Marcin Floryan
TL;DR
This work assesses whether code review acts as a broad information diffusion mechanism by framing it as a communication network and conducting a confirmatory observational study at Spotify. It introduces a measurement model based on a code review graph $G=(C,R)$ augmented by mappings $f_1$, $f_2$, and $f_3$ to capture participants, components, and teams, with diffusion quantified via $Jaccard$ similarity and graph edit distance, all normalized to $[0,1]$. The study tests three hypotheses $H_1$, $H_2$, and $H_3$ derived from theory $T$, using a qualitative rejection criterion rather than fixed thresholds, to determine whether information diffusion transcends social, architectural, and organizational boundaries. The contributions lay in a concrete observational framework, a reproducible measurement pipeline using Spotify data (GitHub Enterprise and Backstage), and a rigorous discussion of limitations and falsification as a path to refining the theory of code review as a communication network.
Abstract
Background: As a core practice in software engineering, the nature of code review has been frequently subject to research. Prior exploratory studies found that code review, the discussion around a code change among humans, forms a communication network that enables its participants to exchange and spread information. Although popular in software engineering, there is no confirmatory research corroborating this theory and the actual extent of information diffusion in code review is not well understood. Objective: In this registered report, we propose an observational study to measure information diffusion in code review to test the theory of code review as communication network. Method: We approximate the information diffusion in code review through the frequency and the similarity between (1) human participants, (2) affected components, and (3) involved teams of linked code reviews. The measurements approximating the information diffusion in code review serve as a foundation for falsifying the theory of code review as communication network.
