Table of Contents
Fetching ...

Balanced Knowledge Distribution among Software Development Teams -- Observations from Open-Source and Closed-Source Software Development

Saad Shafiq, Christoph Mayr-Dorn, Atif Mashkoor, Alexander Egyed

TL;DR

This work addresses the risk of knowledge loss from developer turnover by introducing ConceptRealm, a coarse-grained, issue- and comment-centered representation of domain knowledge distributed among team members. It leverages Latent Dirichlet Allocation to extract concepts from over 300k issues and 1.3M comments across 518 OSS projects, validating that concept keepers exist and that their departure can disrupt key concepts. An industrial evaluation in a Dynatrace project confirms the approach’s practical relevance, showing similar patterns of concept distribution and keeper impact in closed-source settings. The study also demonstrates implications for assignee recommendations and knowledge-balancing strategies, offering a data-driven basis for managing turnover risk and guiding knowledge distribution in both OSS and industrial environments.

Abstract

In software development teams, developer turnover is among the primary reasons for project failures as it leads to a great void of knowledge and strain for the newcomers. Unfortunately, no established methods exist to measure how knowledge is distributed among development teams. Knowing how this knowledge evolves and is owned by key developers in a project helps managers reduce risks caused by turnover. To this end, this paper introduces a novel, realistic representation of domain knowledge distribution: the ConceptRealm. To construct the ConceptRealm, we employ a latent Dirichlet allocation model to represent textual features obtained from 300k issues and 1.3M comments from 518 open-source projects. We analyze whether the newly emerged issues and developers share similar concepts or how aligned the developers' concepts are with the team over time. We also investigate the impact of leaving members on the frequency of concepts. Finally, we evaluate the soundness of our approach to closed-source software, thus allowing the validation of the results from a practical standpoint. We find out that the ConceptRealm can represent the high-level domain knowledge within a team and can be utilized to predict the alignment of developers with issues. We also observe that projects exhibit many keepers independent of project maturity and that abruptly leaving keepers harm the team's concept familiarity.

Balanced Knowledge Distribution among Software Development Teams -- Observations from Open-Source and Closed-Source Software Development

TL;DR

This work addresses the risk of knowledge loss from developer turnover by introducing ConceptRealm, a coarse-grained, issue- and comment-centered representation of domain knowledge distributed among team members. It leverages Latent Dirichlet Allocation to extract concepts from over 300k issues and 1.3M comments across 518 OSS projects, validating that concept keepers exist and that their departure can disrupt key concepts. An industrial evaluation in a Dynatrace project confirms the approach’s practical relevance, showing similar patterns of concept distribution and keeper impact in closed-source settings. The study also demonstrates implications for assignee recommendations and knowledge-balancing strategies, offering a data-driven basis for managing turnover risk and guiding knowledge distribution in both OSS and industrial environments.

Abstract

In software development teams, developer turnover is among the primary reasons for project failures as it leads to a great void of knowledge and strain for the newcomers. Unfortunately, no established methods exist to measure how knowledge is distributed among development teams. Knowing how this knowledge evolves and is owned by key developers in a project helps managers reduce risks caused by turnover. To this end, this paper introduces a novel, realistic representation of domain knowledge distribution: the ConceptRealm. To construct the ConceptRealm, we employ a latent Dirichlet allocation model to represent textual features obtained from 300k issues and 1.3M comments from 518 open-source projects. We analyze whether the newly emerged issues and developers share similar concepts or how aligned the developers' concepts are with the team over time. We also investigate the impact of leaving members on the frequency of concepts. Finally, we evaluate the soundness of our approach to closed-source software, thus allowing the validation of the results from a practical standpoint. We find out that the ConceptRealm can represent the high-level domain knowledge within a team and can be utilized to predict the alignment of developers with issues. We also observe that projects exhibit many keepers independent of project maturity and that abruptly leaving keepers harm the team's concept familiarity.
Paper Structure (40 sections, 4 equations, 17 figures, 4 tables)

This paper contains 40 sections, 4 equations, 17 figures, 4 tables.

Figures (17)

  • Figure 1: Team and Dev-level concepts association - D (Developers), C (Concepts), I (Issues)
  • Figure 2: Representation of concepts - I (Issue), C (Concepts), D (Developer), and W (Weight)
  • Figure 3: Change in concept frequency
  • Figure 4: Study design overview
  • Figure 5: Example of determining the optimal concept number
  • ...and 12 more figures