Table of Contents
Fetching ...

Learning from Mistakes: Understanding Ad-hoc Logs through Analyzing Accidental Commits

Yi-Hung Chou, Yiyang Min, April Yi Wang, James A. Jones

TL;DR

The paper tackles the understudied practice of ad hoc logs used to debug and comprehend runtime behavior. It combines large-scale mining of accidental commits that remove console.log statements from 364,837 JavaScript-related commits (27 GB of data, 548,880 logs) with qualitative analysis of 36 hours of live-stream coding to provide a comprehensive view of how and why developers insert, format, and later remove ad hoc logs. Key findings show that ad hoc logs cluster in asynchronous and root-level code, with deliberate labeling and formatting to trace execution and states, and that removal is often opportunistic and iterative. The dataset and methods offer valuable resources for researchers and tool developers aiming to improve runtime understanding and debugging practices across software projects. The work highlights practical implications for log-placement guidance, labeling conventions, and potential tooling to support comprehension of complex, event-driven code.

Abstract

Developers often insert temporary "print" or "log" instructions into their code to help them better understand runtime behavior, usually when the code is not behaving as they expected. Despite the fact that such monitoring instructions, or "ad-hoc logs," are so commonly used by developers, there is almost no existing literature that studies developers' practices in how they use them. This paucity of knowledge of the use of these ephemeral logs may be largely due to the fact that they typically only exist in the developers' local environments and are removed before they commit their code to their revision control system. In this work, we overcome this challenge by observing that developers occasionally mistakenly forget to remove such instructions before committing, and then they remove them shortly later. Additionally, we further study such developer logging practices by watching and analyzing live-streamed coding videos. Through these empirical approaches, we study where, how, and why developers use ad-hoc logs to better understand their code and its execution. We collect 27 GB of accidental commits that removed 548,880 ad-hoc logs in JavaScript from GitHub Archive repositories to provide the first large-scale dataset and empirical studies on ad-hoc logging practices. Our results reveal several illuminating findings, including a particular propensity for developers to use ad-hoc logs in asynchronous and callback functions. Our findings provide both empirical evidence and a valuable dataset for researchers and tool developers seeking to enhance ad-hoc logging practices, and potentially deepen our understanding of developers' practices towards understanding of software's runtime behaviors.

Learning from Mistakes: Understanding Ad-hoc Logs through Analyzing Accidental Commits

TL;DR

The paper tackles the understudied practice of ad hoc logs used to debug and comprehend runtime behavior. It combines large-scale mining of accidental commits that remove console.log statements from 364,837 JavaScript-related commits (27 GB of data, 548,880 logs) with qualitative analysis of 36 hours of live-stream coding to provide a comprehensive view of how and why developers insert, format, and later remove ad hoc logs. Key findings show that ad hoc logs cluster in asynchronous and root-level code, with deliberate labeling and formatting to trace execution and states, and that removal is often opportunistic and iterative. The dataset and methods offer valuable resources for researchers and tool developers aiming to improve runtime understanding and debugging practices across software projects. The work highlights practical implications for log-placement guidance, labeling conventions, and potential tooling to support comprehension of complex, event-driven code.

Abstract

Developers often insert temporary "print" or "log" instructions into their code to help them better understand runtime behavior, usually when the code is not behaving as they expected. Despite the fact that such monitoring instructions, or "ad-hoc logs," are so commonly used by developers, there is almost no existing literature that studies developers' practices in how they use them. This paucity of knowledge of the use of these ephemeral logs may be largely due to the fact that they typically only exist in the developers' local environments and are removed before they commit their code to their revision control system. In this work, we overcome this challenge by observing that developers occasionally mistakenly forget to remove such instructions before committing, and then they remove them shortly later. Additionally, we further study such developer logging practices by watching and analyzing live-streamed coding videos. Through these empirical approaches, we study where, how, and why developers use ad-hoc logs to better understand their code and its execution. We collect 27 GB of accidental commits that removed 548,880 ad-hoc logs in JavaScript from GitHub Archive repositories to provide the first large-scale dataset and empirical studies on ad-hoc logging practices. Our results reveal several illuminating findings, including a particular propensity for developers to use ad-hoc logs in asynchronous and callback functions. Our findings provide both empirical evidence and a valuable dataset for researchers and tool developers seeking to enhance ad-hoc logging practices, and potentially deepen our understanding of developers' practices towards understanding of software's runtime behaviors.
Paper Structure (15 sections, 1 equation, 11 figures, 4 tables)

This paper contains 15 sections, 1 equation, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Example GitHub commit that removes a previous accidental commit of an ad-hoc log
  • Figure 2: Two Data Sources (GitHub Archive & Google BigQuery) and Their Distributions
  • Figure 3: Screenshot of Observe-dev.online, when a developer is adding a log into their code
  • Figure 4: Cumulative Proportion of Projects by Months Since Last Updated for All Collected Repositories.
  • Figure 5: Custom approach to assign names to anonymous functions. Tokens highlighted in red represent the names assigned to these functions. We use such method to conduct analysis on function names later.
  • ...and 6 more figures