Progressive Code Integration for Abstractive Bug Report Summarization
Shaira Sadia Karim, Abrar Mahmud Rahim, Lamia Alam, Ishmam Tashdeed, Lutfun Nahar Lota, Md. Abu Raihan M. Kamal, Md. Azam Hossain
TL;DR
Bug reports are unstructured and verbose, hindering quick comprehension. The authors introduce a progressive code-integration framework that combines textual reports with full code snippets via structured prompts and hierarchical code summarization, with parameter-efficient fine-tuning. The approach outperforms extractive baselines and achieves competitive semantic fidelity relative to state-of-the-art abstractive methods across four datasets and eight LLMs. The work demonstrates the value of jointly leveraging textual and code information for bug comprehension and highlights directions for human evaluation and larger multimodal datasets and metrics.
Abstract
Bug reports are often unstructured and verbose, making it challenging for developers to efficiently comprehend software issues. Existing summarization approaches typically rely on surface-level textual cues, resulting in incomplete or redundant summaries, and they frequently ignore associated code snippets, which are essential for accurate defect diagnosis. To address these limitations, we propose a progressive code-integration framework for LLM-based abstractive bug report summarization. Our approach incrementally incorporates long code snippets alongside textual content, overcoming standard LLM context window constraints and producing semantically rich summaries. Evaluated on four benchmark datasets using eight LLMs, our pipeline outperforms extractive baselines by 7.5%-58.2% and achieves performance comparable to state-of-the-art abstractive methods, highlighting the benefits of jointly leveraging textual and code information for enhanced bug comprehension.
