Learning from Mistakes: Understanding Ad-hoc Logs through Analyzing Accidental Commits

Yi-Hung Chou; Yiyang Min; April Yi Wang; James A. Jones

Learning from Mistakes: Understanding Ad-hoc Logs through Analyzing Accidental Commits

Yi-Hung Chou, Yiyang Min, April Yi Wang, James A. Jones

TL;DR

The paper tackles the understudied practice of ad hoc logs used to debug and comprehend runtime behavior. It combines large-scale mining of accidental commits that remove console.log statements from 364,837 JavaScript-related commits (27 GB of data, 548,880 logs) with qualitative analysis of 36 hours of live-stream coding to provide a comprehensive view of how and why developers insert, format, and later remove ad hoc logs. Key findings show that ad hoc logs cluster in asynchronous and root-level code, with deliberate labeling and formatting to trace execution and states, and that removal is often opportunistic and iterative. The dataset and methods offer valuable resources for researchers and tool developers aiming to improve runtime understanding and debugging practices across software projects. The work highlights practical implications for log-placement guidance, labeling conventions, and potential tooling to support comprehension of complex, event-driven code.

Abstract

Developers often insert temporary "print" or "log" instructions into their code to help them better understand runtime behavior, usually when the code is not behaving as they expected. Despite the fact that such monitoring instructions, or "ad-hoc logs," are so commonly used by developers, there is almost no existing literature that studies developers' practices in how they use them. This paucity of knowledge of the use of these ephemeral logs may be largely due to the fact that they typically only exist in the developers' local environments and are removed before they commit their code to their revision control system. In this work, we overcome this challenge by observing that developers occasionally mistakenly forget to remove such instructions before committing, and then they remove them shortly later. Additionally, we further study such developer logging practices by watching and analyzing live-streamed coding videos. Through these empirical approaches, we study where, how, and why developers use ad-hoc logs to better understand their code and its execution. We collect 27 GB of accidental commits that removed 548,880 ad-hoc logs in JavaScript from GitHub Archive repositories to provide the first large-scale dataset and empirical studies on ad-hoc logging practices. Our results reveal several illuminating findings, including a particular propensity for developers to use ad-hoc logs in asynchronous and callback functions. Our findings provide both empirical evidence and a valuable dataset for researchers and tool developers seeking to enhance ad-hoc logging practices, and potentially deepen our understanding of developers' practices towards understanding of software's runtime behaviors.

Learning from Mistakes: Understanding Ad-hoc Logs through Analyzing Accidental Commits

TL;DR

Abstract

Paper Structure (15 sections, 1 equation, 11 figures, 4 tables)

This paper contains 15 sections, 1 equation, 11 figures, 4 tables.

Introduction
Background
Research Questions
Dataset Curation
Data Analysis
Analyzing Ad-hoc Logs through Mistakes
Qualitative Analysis of Live Streaming Data
Results
RQ1: Where do developers put these logs in their code?
RQ2 What do developers put in the log statement?
RQ3: Why do developers use ad-hoc logs?
RQ4: How do developers manipulate (insert, revise, and remove) ad-hoc logs to achieve their goal?
Discussion
Conclusion
Acknowledgment

Figures (11)

Figure 1: Example GitHub commit that removes a previous accidental commit of an ad-hoc log
Figure 2: Two Data Sources (GitHub Archive & Google BigQuery) and Their Distributions
Figure 3: Screenshot of Observe-dev.online, when a developer is adding a log into their code
Figure 4: Cumulative Proportion of Projects by Months Since Last Updated for All Collected Repositories.
Figure 5: Custom approach to assign names to anonymous functions. Tokens highlighted in red represent the names assigned to these functions. We use such method to conduct analysis on function names later.
...and 6 more figures

Learning from Mistakes: Understanding Ad-hoc Logs through Analyzing Accidental Commits

TL;DR

Abstract

Learning from Mistakes: Understanding Ad-hoc Logs through Analyzing Accidental Commits

Authors

TL;DR

Abstract

Table of Contents

Figures (11)