Table of Contents
Fetching ...

Towards the First Code Contribution: Processes and Information Needs

Christoph Treude, Marco A. Gerosa, Igor Steinmacher

TL;DR

This study addresses the problem of onboarding newcomers to software projects by identifying the barriers created by dispersed and incomplete documentation. Through an empirical approach—surveying ~100 practitioners, applying grounded theory, and conducting validation interviews—the authors derive a detailed 16-step model of the newcomer process and a taxonomy of information sources and types, revealing cross-cutting needs for communication and mentoring. Key contributions include the validated 16-step process, the information-source/type taxonomy, and practical implications for tool support, highlighting how project and individual characteristics shape information relevancy. The findings support the design of a parametrized documentation portal that automatically extracts, summarizes, and personalizes newcomer content, with AI-assisted generation as a future enhancement to adapt to each learner's path and context. Overall, the work provides empirically grounded requirements for automated onboarding tools and a clear roadmap toward tailored, accessible newcomer documentation in real-world software projects.

Abstract

Newcomers to a software project must overcome many barriers before they can successfully place their first code contribution, and they often struggle to find information that is relevant to them. In this work, we argue that much of the information needed by newcomers already exists, albeit scattered among many different sources, and that many barriers can be addressed by automatically identifying, extracting, generating, summarizing, and presenting documentation that is specifically aimed and customized for newcomers. To gain a detailed understanding of the processes followed by newcomers and their information needs before making their first code contribution, we conducted an empirical study. Based on a survey with about 100 practitioners, grounded theory analysis, and validation interviews, we contribute a 16-step model for the processes followed by newcomers to a software project and we identify relevant information, along with individual and project characteristics that influence the relevancy of information types and sources. Our findings form an essential step towards automated tool support that provides relevant information to project newcomers in each step of their contribution processes.

Towards the First Code Contribution: Processes and Information Needs

TL;DR

This study addresses the problem of onboarding newcomers to software projects by identifying the barriers created by dispersed and incomplete documentation. Through an empirical approach—surveying ~100 practitioners, applying grounded theory, and conducting validation interviews—the authors derive a detailed 16-step model of the newcomer process and a taxonomy of information sources and types, revealing cross-cutting needs for communication and mentoring. Key contributions include the validated 16-step process, the information-source/type taxonomy, and practical implications for tool support, highlighting how project and individual characteristics shape information relevancy. The findings support the design of a parametrized documentation portal that automatically extracts, summarizes, and personalizes newcomer content, with AI-assisted generation as a future enhancement to adapt to each learner's path and context. Overall, the work provides empirically grounded requirements for automated onboarding tools and a clear roadmap toward tailored, accessible newcomer documentation in real-world software projects.

Abstract

Newcomers to a software project must overcome many barriers before they can successfully place their first code contribution, and they often struggle to find information that is relevant to them. In this work, we argue that much of the information needed by newcomers already exists, albeit scattered among many different sources, and that many barriers can be addressed by automatically identifying, extracting, generating, summarizing, and presenting documentation that is specifically aimed and customized for newcomers. To gain a detailed understanding of the processes followed by newcomers and their information needs before making their first code contribution, we conducted an empirical study. Based on a survey with about 100 practitioners, grounded theory analysis, and validation interviews, we contribute a 16-step model for the processes followed by newcomers to a software project and we identify relevant information, along with individual and project characteristics that influence the relevancy of information types and sources. Our findings form an essential step towards automated tool support that provides relevant information to project newcomers in each step of their contribution processes.
Paper Structure (18 sections, 1 figure, 5 tables)

This paper contains 18 sections, 1 figure, 5 tables.

Figures (1)

  • Figure 1: Newcomers' processes from after they have decided to contribute until their first code contribution