End-to-End Automated Logging via Multi-Agent Framework
Renyi Zhong, Yintong Huo, Wenwei Gu, Yichen Li, Michael R. Lyu
TL;DR
AutoLogger tackles the complete automated logging pipeline, addressing the neglected whether-to-log decision and the composite nature of logging by combining a fine-tuned Judger with a task-decomposing multi-agent system (MAS). The Judger efficiently filters methods that require new logs, while the Locator and Generator, aided by a tool pool (including similar-case retrieval and program-analysis utilities), ground reasoning in code facts to produce high-quality log statements. Across three mature projects, AutoLogger delivers a 96.63% F1 on the whether-to-log task and achieves a 16.13% end-to-end improvement in log quality over the strongest baseline, with robust generalizability across backbone LLMs. The framework demonstrates strong potential for practical observability enhancements and provides replication artifacts to support broader adoption and extension.
Abstract
Software logging is critical for system observability, yet developers face a dual crisis of costly overlogging and risky underlogging. Existing automated logging tools often overlook the fundamental whether-to-log decision and struggle with the composite nature of logging. In this paper, we propose Autologger, a novel hybrid framework that addresses the complete the end-to-end logging pipeline. Autologger first employs a fine-tuned classifier, the Judger, to accurately determine if a method requires new logging statements. If logging is needed, a multi-agent system is activated. The system includes specialized agents: a Locator dedicated to determining where to log, and a Generator focused on what to log. These agents work together, utilizing our designed program analysis and retrieval tools. We evaluate Autologger on a large corpus from three mature open-source projects against state-of-the-art baselines. Our results show that Autologger achieves 96.63\% F1-score on the crucial whether-to-log decision. In an end-to-end setting, Autologger improves the overall quality of generated logging statements by 16.13\% over the strongest baseline, as measured by an LLM-as-a-judge score. We also demonstrate that our framework is generalizable, consistently boosting the performance of various backbone LLMs.
