Past, Present, and Future of Bug Tracking in the Generative AI Era
Utku Boran Torun, Mehmet Taha Demircan, Mahmut Furkan Gön, Eray Tüzün
TL;DR
The paper addresses the inefficiencies of traditional bug tracking by proposing an AI-powered framework that uses multiple specialized LLM agents to automate the end-to-end bug lifecycle, from natural-language bug report creation to deployment. It introduces a central Model Context Protocol (MCP) to coordinate agents and maintain provenance, all within a human-in-the-loop (HIL) governance model to preserve accountability. Key contributions include automated bug reproduction, validity checking, bug-feature traceability, no-code fixes, AI-generated patches with human review, and CI/CD-aligned deployment, aimed at reducing time-to-fix while lowering manual overhead. The work highlights practical implications, architectural considerations, and research directions, acknowledging challenges such as cascading errors, evaluation, privacy, and domain-specific generalization. This framework promises to make bug maintenance more proactive, user-centric, and scalable in the Generative AI era.
Abstract
Traditional bug tracking systems rely heavily on manual reporting, reproduction, triaging, and resolution, each carried out by different stakeholders such as end users, customer support, developers, and testers. This division of responsibilities requires significant coordination and widens the communication gap between non-technical users and technical teams, slowing the process from bug discovery to resolution. Moreover, current systems are highly asynchronous; users often wait hours or days for a first response, delaying fixes and contributing to frustration. This paper examines the evolution of bug tracking, from early paper-based reporting to today's web-based and SaaS platforms. Building on this trajectory, we propose an AI-powered bug tracking framework that augments existing tools with intelligent, large language model (LLM)-driven automation. Our framework addresses two main challenges: reducing time-to-fix and minimizing human overhead. Users report issues in natural language, while AI agents refine reports, attempt reproduction, and request missing details. Reports are then classified, invalid ones resolved through no-code fixes, and valid ones localized and assigned to developers. LLMs also generate candidate patches, with human oversight ensuring correctness. By integrating automation into each phase, our framework accelerates response times, improves collaboration, and strengthens software maintenance practices for a more efficient, user-centric future.
