Table of Contents
Fetching ...

Hamster: A Large-Scale Study and Characterization of Developer-Written Tests

Rangeet Pan, Tyler Stennett, Raju Pavuluri, Nate Levin, Alessandro Orso, Saurabh Sinha

TL;DR

This paper presents Hamster, a large-scale empirical study of developer-written Java tests, based on 1.7M tests from 1,908 projects, to understand qualities beyond coverage and to contrast them with ATG-generated tests from EvoSuite and Aster. It introduces a three-level Hamster model (Project/Class/Method) to capture fixtures, inputs, focal classes/methods, call sequences, and test-scope, enabling rich analysis of real-world testing patterns. The study reveals that developer-written tests routinely exercise multiple classes and methods, rely on structured external inputs and sophisticated fixtures with mocks, and employ complex, varied assertion patterns—characteristics largely absent in current ATG approaches. The authors propose directions such as modular fixtures, semantic input generation, and LLM-assisted test generation to bridge the gap and guide future ATG tool development toward more realistic, maintainable, and evolvable test suites.

Abstract

Automated test generation (ATG), which aims to reduce the cost of manual test suite development, has been investigated for decades and has produced countless techniques based on a variety of approaches: symbolic analysis, search-based, random and adaptive-random, learning-based, and, most recently, large-language-model-based approaches. However, despite this large body of research, there is still a gap in our understanding of the characteristics of developer-written tests and, consequently, in our assessment of how well ATG techniques and tools can generate realistic and representative tests. To bridge this gap, we conducted an extensive empirical study of developer-written tests for Java applications, covering 1.7 million test cases from open-source repositories. Our study is the first of its kind in studying aspects of developer-written tests that are mostly neglected in the existing literature, such as test scope, test fixtures and assertions, types of inputs, and use of mocking. Based on the characterization, we then compare existing tests with those generated by two state-of-the-art ATG tools. Our results highlight that a vast majority of developer-written tests exhibit characteristics that are beyond the capabilities of current ATG tools. Finally, based on the insights gained from the study, we identify promising research directions that can help bridge the gap between current tool capabilities and more effective tool support for developer testing practices. We hope that this work can set the stage for new advances in the field and bring ATG tools closer to generating the types of tests developers write.

Hamster: A Large-Scale Study and Characterization of Developer-Written Tests

TL;DR

This paper presents Hamster, a large-scale empirical study of developer-written Java tests, based on 1.7M tests from 1,908 projects, to understand qualities beyond coverage and to contrast them with ATG-generated tests from EvoSuite and Aster. It introduces a three-level Hamster model (Project/Class/Method) to capture fixtures, inputs, focal classes/methods, call sequences, and test-scope, enabling rich analysis of real-world testing patterns. The study reveals that developer-written tests routinely exercise multiple classes and methods, rely on structured external inputs and sophisticated fixtures with mocks, and employ complex, varied assertion patterns—characteristics largely absent in current ATG approaches. The authors propose directions such as modular fixtures, semantic input generation, and LLM-assisted test generation to bridge the gap and guide future ATG tool development toward more realistic, maintainable, and evolvable test suites.

Abstract

Automated test generation (ATG), which aims to reduce the cost of manual test suite development, has been investigated for decades and has produced countless techniques based on a variety of approaches: symbolic analysis, search-based, random and adaptive-random, learning-based, and, most recently, large-language-model-based approaches. However, despite this large body of research, there is still a gap in our understanding of the characteristics of developer-written tests and, consequently, in our assessment of how well ATG techniques and tools can generate realistic and representative tests. To bridge this gap, we conducted an extensive empirical study of developer-written tests for Java applications, covering 1.7 million test cases from open-source repositories. Our study is the first of its kind in studying aspects of developer-written tests that are mostly neglected in the existing literature, such as test scope, test fixtures and assertions, types of inputs, and use of mocking. Based on the characterization, we then compare existing tests with those generated by two state-of-the-art ATG tools. Our results highlight that a vast majority of developer-written tests exhibit characteristics that are beyond the capabilities of current ATG tools. Finally, based on the insights gained from the study, we identify promising research directions that can help bridge the gap between current tool capabilities and more effective tool support for developer testing practices. We hope that this work can set the stage for new advances in the field and bring ATG tools closer to generating the types of tests developers write.

Paper Structure

This paper contains 14 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: The Hamster data collection process.
  • Figure 2: The Hamster analysis data model.
  • Figure 3: Usage of different categories of testing frameworks in the dataset.
  • Figure 4: Test-scope analysis for the test methods in the Hamster dataset.
  • Figure 5: Number of focal classes and methods per test.
  • ...and 2 more figures