Table of Contents
Fetching ...

Measuring Information Diffusion in Code Review at Spotify

Michael Dorner, Daniel Mendez, Ehsan Zabardast, Nicole Valdez, Marcin Floryan

TL;DR

This work assesses whether code review acts as a broad information diffusion mechanism by framing it as a communication network and conducting a confirmatory observational study at Spotify. It introduces a measurement model based on a code review graph $G=(C,R)$ augmented by mappings $f_1$, $f_2$, and $f_3$ to capture participants, components, and teams, with diffusion quantified via $Jaccard$ similarity and graph edit distance, all normalized to $[0,1]$. The study tests three hypotheses $H_1$, $H_2$, and $H_3$ derived from theory $T$, using a qualitative rejection criterion rather than fixed thresholds, to determine whether information diffusion transcends social, architectural, and organizational boundaries. The contributions lay in a concrete observational framework, a reproducible measurement pipeline using Spotify data (GitHub Enterprise and Backstage), and a rigorous discussion of limitations and falsification as a path to refining the theory of code review as a communication network.

Abstract

Background: As a core practice in software engineering, the nature of code review has been frequently subject to research. Prior exploratory studies found that code review, the discussion around a code change among humans, forms a communication network that enables its participants to exchange and spread information. Although popular in software engineering, there is no confirmatory research corroborating this theory and the actual extent of information diffusion in code review is not well understood. Objective: In this registered report, we propose an observational study to measure information diffusion in code review to test the theory of code review as communication network. Method: We approximate the information diffusion in code review through the frequency and the similarity between (1) human participants, (2) affected components, and (3) involved teams of linked code reviews. The measurements approximating the information diffusion in code review serve as a foundation for falsifying the theory of code review as communication network.

Measuring Information Diffusion in Code Review at Spotify

TL;DR

This work assesses whether code review acts as a broad information diffusion mechanism by framing it as a communication network and conducting a confirmatory observational study at Spotify. It introduces a measurement model based on a code review graph augmented by mappings , , and to capture participants, components, and teams, with diffusion quantified via similarity and graph edit distance, all normalized to . The study tests three hypotheses , , and derived from theory , using a qualitative rejection criterion rather than fixed thresholds, to determine whether information diffusion transcends social, architectural, and organizational boundaries. The contributions lay in a concrete observational framework, a reproducible measurement pipeline using Spotify data (GitHub Enterprise and Backstage), and a rigorous discussion of limitations and falsification as a path to refining the theory of code review as a communication network.

Abstract

Background: As a core practice in software engineering, the nature of code review has been frequently subject to research. Prior exploratory studies found that code review, the discussion around a code change among humans, forms a communication network that enables its participants to exchange and spread information. Although popular in software engineering, there is no confirmatory research corroborating this theory and the actual extent of information diffusion in code review is not well understood. Objective: In this registered report, we propose an observational study to measure information diffusion in code review to test the theory of code review as communication network. Method: We approximate the information diffusion in code review through the frequency and the similarity between (1) human participants, (2) affected components, and (3) involved teams of linked code reviews. The measurements approximating the information diffusion in code review serve as a foundation for falsifying the theory of code review as communication network.
Paper Structure (11 sections, 1 equation, 6 figures)

This paper contains 11 sections, 1 equation, 6 figures.

Figures (6)

  • Figure 1: The empirical research cycle (in analogy to Mendez2019): While exploratory research is theory-generating using inductive reasoning (starting with observations), confirmatory research is theory-testing using deductive reasoning (starting with a theory). This research is confirmatory.
  • Figure 2: An exemplary code review network with code reviews as vertices and references between them as edges. Code reviews can reference one (see $c_1$, which references $c_2$), multiple (see $c_2$, which references $c_0$ and $c_3$), or no other code review (see $c_4$).
  • Figure 3: Three examples of cumulative distributions of the information diffusion measured in form of similarity of linked code reviews with respect to participants, components, and teams: Depending on the discussions of those results, we may or may not reject our hypotheses and, thus, falsify our theory.
  • Figure 4: A circular layout of the components grouped by the owning teams and linked by code reviews. This visualization may be too cluttered for massive information diffusion.
  • Figure 5: An overview of our measuring instrument. The raw data is extracted from two different sources, Spotify's internal GitHub Enterprise (light blue) and Backstage instance (yellow). The results feed into the measurement model.
  • ...and 1 more figures