Table of Contents
Fetching ...

The First Issue Matters: Linking Task-Level Characteristics to Long-Term Newcomer Retention in OSS

Yichen Hao, Weiwei Xu, Kai Gao, Xiaofang Zhang

Abstract

Sustaining newcomer participation is critical for the long-term health of open-source communities. Although prior research has explored various task recommendation approaches to help newcomers resolve their first-issue, these methods overlook how characteristics of first-issues may influence newcomers' long-term retention, limiting our understanding of whether initial success leads to sustained participation and hindering effective onboarding design. In this paper, we conduct a large-scale empirical study to examine how first-issue characteristics affect newcomer retention. We combine predictive analysis, interpretability techniques, and causal inference to estimate the causal effects of issue characteristics on retention outcomes. The prediction task supports the interpretation and shows that interaction-related characteristics exhibit stronger associations with retention than intrinsic issue attributes. The causal analysis further reveals that issues reported by moderately experienced contributors, accompanied by moderate discussion intensity and participation from project members, and neutral or slightly negative comment sentiment, have higher retention potential. These findings provide actionable insights for OSS maintainers on designing issue management practices that better support long-term newcomer retention.

The First Issue Matters: Linking Task-Level Characteristics to Long-Term Newcomer Retention in OSS

Abstract

Sustaining newcomer participation is critical for the long-term health of open-source communities. Although prior research has explored various task recommendation approaches to help newcomers resolve their first-issue, these methods overlook how characteristics of first-issues may influence newcomers' long-term retention, limiting our understanding of whether initial success leads to sustained participation and hindering effective onboarding design. In this paper, we conduct a large-scale empirical study to examine how first-issue characteristics affect newcomer retention. We combine predictive analysis, interpretability techniques, and causal inference to estimate the causal effects of issue characteristics on retention outcomes. The prediction task supports the interpretation and shows that interaction-related characteristics exhibit stronger associations with retention than intrinsic issue attributes. The causal analysis further reveals that issues reported by moderately experienced contributors, accompanied by moderate discussion intensity and participation from project members, and neutral or slightly negative comment sentiment, have higher retention potential. These findings provide actionable insights for OSS maintainers on designing issue management practices that better support long-term newcomer retention.

Paper Structure

This paper contains 55 sections, 4 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: SHAP summary plot. Features are ordered from left to right by decreasing mean absolute SHAP value. Each point represents a sample and color indicates the feature value of this sample.
  • Figure 2: Causal DAG for modeling issue-related factors and newcomer retention. Treatments are instantiated from the graph, and covariates are selected via the backdoor criterion.
  • Figure 3: Estimated causal effects of issue discussion dynamics prior to pull request submission on newcomer retention.
  • Figure 4: Estimated causal effect of early-stage comment sentiment on newcomer retention.
  • Figure 5: Estimated causal effects of issue reporter experience on newcomer retention, using two activity proxies.