Table of Contents
Fetching ...

Establishing Traceability Links between Release Notes & Software Artifacts: Practitioners' Perspectives

Sristy Sumana Nath, Banani Roy, Munima Jahan

TL;DR

This paper tackles the challenge of maintaining traceability between release notes and underlying GitHub artifacts (PRs, commits, issues) in open-source projects, where links are frequently missing or broken. It analyzes release notes to extract What, Why, and How information and introduces a 3,500‑item ground‑truth benchmark for evaluating traceability recovery. The authors propose an LLM‑based approach (Meta LLaMA 3.1 and Gemini 1.5 Pro) that fuses textual similarity with time proximity signals to outperform traditional baselines, achieving Precision@1 values up to 0.73 for PRs and 0.70 for issues, with strong MRR. A practitioner survey (n=33) reveals substantial interest in automated solutions despite current inconsistent practices, underscoring the practical relevance and potential impact on maintainability, onboarding, and collaboration in open‑source development.

Abstract

Maintaining traceability links between software release notes and corresponding development artifacts, e.g., pull requests (PRs), commits, and issues, is essential for managing technical debt and ensuring maintainability. However, in open-source environments where contributors work remotely and asynchronously, establishing and maintaining these links is often error-prone, time-consuming, and frequently overlooked. Our empirical study of GitHub repositories revealed that 47% of release artifacts lacked traceability links, and 12% contained broken links. To address this gap, we first analyzed release notes to identify their What, Why, and How information and assessed how these align with PRs, commits, and issues. We curated a benchmark dataset consisting of 3,500 filtered and validated traceability link instances. Then, we implemented LLM-based approaches to automatically establish traceability links of three pairs between release note contents & PRs, release note contents & PRs and release note contents & issues. By combining the time proximity feature, the LLM-based approach, e.g., Gemini 1.5 Pro, achieved a high Precision@1 value of 0.73 for PR traceability recovery. To evaluate the usability and adoption potential of this approach, we conducted an online survey involving 33 open-source practitioners. 16% of respondents rated as very important, and 68% as somewhat important for traceability maintenance.

Establishing Traceability Links between Release Notes & Software Artifacts: Practitioners' Perspectives

TL;DR

This paper tackles the challenge of maintaining traceability between release notes and underlying GitHub artifacts (PRs, commits, issues) in open-source projects, where links are frequently missing or broken. It analyzes release notes to extract What, Why, and How information and introduces a 3,500‑item ground‑truth benchmark for evaluating traceability recovery. The authors propose an LLM‑based approach (Meta LLaMA 3.1 and Gemini 1.5 Pro) that fuses textual similarity with time proximity signals to outperform traditional baselines, achieving Precision@1 values up to 0.73 for PRs and 0.70 for issues, with strong MRR. A practitioner survey (n=33) reveals substantial interest in automated solutions despite current inconsistent practices, underscoring the practical relevance and potential impact on maintainability, onboarding, and collaboration in open‑source development.

Abstract

Maintaining traceability links between software release notes and corresponding development artifacts, e.g., pull requests (PRs), commits, and issues, is essential for managing technical debt and ensuring maintainability. However, in open-source environments where contributors work remotely and asynchronously, establishing and maintaining these links is often error-prone, time-consuming, and frequently overlooked. Our empirical study of GitHub repositories revealed that 47% of release artifacts lacked traceability links, and 12% contained broken links. To address this gap, we first analyzed release notes to identify their What, Why, and How information and assessed how these align with PRs, commits, and issues. We curated a benchmark dataset consisting of 3,500 filtered and validated traceability link instances. Then, we implemented LLM-based approaches to automatically establish traceability links of three pairs between release note contents & PRs, release note contents & PRs and release note contents & issues. By combining the time proximity feature, the LLM-based approach, e.g., Gemini 1.5 Pro, achieved a high Precision@1 value of 0.73 for PR traceability recovery. To evaluate the usability and adoption potential of this approach, we conducted an online survey involving 33 open-source practitioners. 16% of respondents rated as very important, and 68% as somewhat important for traceability maintenance.

Paper Structure

This paper contains 24 sections, 3 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Example of natural language artifacts: a) Release notes, b) Issues, c) Pull requests, and d) Commits
  • Figure 2: Research overview
  • Figure 3: Dataset Overview
  • Figure 4: Participants' Role in GitHub
  • Figure 5: Percentage of containing What, Why and How information
  • ...and 2 more figures