Table of Contents
Fetching ...

An Empirical Study on the Characteristics of Database Access Bugs in Java Applications

Wei Liu, Shouvick Mondal, Tse-Hsun Chen

TL;DR

This work addresses the lack of empirical understanding of database access bugs in Java database-backed applications. By analyzing 423 bugs from seven large open-source projects using JDBC or Hibernate, the authors derive a five-category, 25-root-cause taxonomy and demonstrate that SQL-queries, schema, and API issues account for the bulk of bugs ($84.2\%$), with JDBC-dominated SQL-query bugs ($54\%$) and Hibernate-dominated API bugs ($38.7\%$). The study reveals that database bugs track development activity similarly to non-database bugs but often fix in different code regions, underscoring unique maintenance challenges. Practical implications include improved SQL-query and schema verification, ORM-focused checks, and test-generation strategies to enhance reliability of database-backed applications; a replication package is provided for validation and extension.

Abstract

Database-backed applications rely on the database access code to interact with the underlying database management systems (DBMSs). Although many prior studies aim at database access issues like SQL anti-patterns or SQL code smells, there is a lack of study of database access bugs during the maintenance of database-backed applications. In this paper, we empirically investigate 423 database access bugs collected from seven large-scale Java open source applications that use relational database management systems (e.g., MySQL or PostgreSQL). We study the characteristics (e.g., occurrence and root causes) of the bugs by manually examining the bug reports and commit histories. We find that the number of reported database and non-database access bugs share a similar trend but their modified files in bug fixing commits are different. Additionally, we generalize categories of the root causes of database access bugs, containing five main categories (SQL queries, Schema, API, Configuration, SQL query result) and 25 unique root causes. We find that the bugs pertaining to SQL queries, Schema, and API cover 84.2% of database access bugs across all studied applications. In particular, SQL queries bug (54%) and API bug (38.7%) are the most frequent issues when using JDBC and Hibernate, respectively. Finally, we provide a discussion on the implications of our findings for developers and researchers.

An Empirical Study on the Characteristics of Database Access Bugs in Java Applications

TL;DR

This work addresses the lack of empirical understanding of database access bugs in Java database-backed applications. By analyzing 423 bugs from seven large open-source projects using JDBC or Hibernate, the authors derive a five-category, 25-root-cause taxonomy and demonstrate that SQL-queries, schema, and API issues account for the bulk of bugs (), with JDBC-dominated SQL-query bugs () and Hibernate-dominated API bugs (). The study reveals that database bugs track development activity similarly to non-database bugs but often fix in different code regions, underscoring unique maintenance challenges. Practical implications include improved SQL-query and schema verification, ORM-focused checks, and test-generation strategies to enhance reliability of database-backed applications; a replication package is provided for validation and extension.

Abstract

Database-backed applications rely on the database access code to interact with the underlying database management systems (DBMSs). Although many prior studies aim at database access issues like SQL anti-patterns or SQL code smells, there is a lack of study of database access bugs during the maintenance of database-backed applications. In this paper, we empirically investigate 423 database access bugs collected from seven large-scale Java open source applications that use relational database management systems (e.g., MySQL or PostgreSQL). We study the characteristics (e.g., occurrence and root causes) of the bugs by manually examining the bug reports and commit histories. We find that the number of reported database and non-database access bugs share a similar trend but their modified files in bug fixing commits are different. Additionally, we generalize categories of the root causes of database access bugs, containing five main categories (SQL queries, Schema, API, Configuration, SQL query result) and 25 unique root causes. We find that the bugs pertaining to SQL queries, Schema, and API cover 84.2% of database access bugs across all studied applications. In particular, SQL queries bug (54%) and API bug (38.7%) are the most frequent issues when using JDBC and Hibernate, respectively. Finally, we provide a discussion on the implications of our findings for developers and researchers.
Paper Structure (13 sections, 3 figures, 4 tables)

This paper contains 13 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Examples of database access using JDBC and Hibernate.
  • Figure 2: The trend of reported database access bugs (DBBug) and non-database access bugs (NDBBug) across the studied applications. The reported bugs are aggregated at a fixed time interval according to their reporting time in each application.
  • Figure 3: Distribution of the categories of database access bugs that occur in JDBC and Hibernate database-backed applications.