Table of Contents
Fetching ...

What Do We Mean by 'Pilot Study': Early Findings from a Meta-Review of Pilot Study Reporting at CHI

Belu Ticona, Amna Liaqat, Antonios Anastasopoulos

TL;DR

This study investigates how CHI pilot studies are defined and reported, revealing conceptual vagueness and inconsistent practice in human–computer interaction research. It builds CHIPS, a dataset of 904 CHI papers mentioning pilot-related terms, and applies manual coding plus LLM-based annotation to categorize reporting structures and their influence on main studies. The findings show pilots are common but rarely treated as independent study units, and pilot results are often summarized with limited detail, constraining replicability. The work advocates for community-informed reporting guidelines and outlines next steps to broaden data coverage and refine annotation methods.

Abstract

Pilot studies (PS) are ubiquitous in HCI research. CHI papers routinely reference 'pilot studies', 'pilot tests', or 'preliminary studies' to justify design decisions, verify procedures, or motivate methodological choices. Yet despite their frequency, the role of pilot studies in HCI remains conceptually vague and empirically underexamined. Unlike fields such as medicine, nursing, and education, where pilot and feasibility studies have well-established definitions, guidelines, reporting standards and even a dedicated research journal, the CHI community lacks a shared understanding of what constitutes a pilot study, why they are conducted, and how they should be reported. Many papers reference pilots 'in passing', without details about design, outcomes, or how the pilot informed the main study. This variability suggests a methodological blind spot in our community.

What Do We Mean by 'Pilot Study': Early Findings from a Meta-Review of Pilot Study Reporting at CHI

TL;DR

This study investigates how CHI pilot studies are defined and reported, revealing conceptual vagueness and inconsistent practice in human–computer interaction research. It builds CHIPS, a dataset of 904 CHI papers mentioning pilot-related terms, and applies manual coding plus LLM-based annotation to categorize reporting structures and their influence on main studies. The findings show pilots are common but rarely treated as independent study units, and pilot results are often summarized with limited detail, constraining replicability. The work advocates for community-informed reporting guidelines and outlines next steps to broaden data coverage and refine annotation methods.

Abstract

Pilot studies (PS) are ubiquitous in HCI research. CHI papers routinely reference 'pilot studies', 'pilot tests', or 'preliminary studies' to justify design decisions, verify procedures, or motivate methodological choices. Yet despite their frequency, the role of pilot studies in HCI remains conceptually vague and empirically underexamined. Unlike fields such as medicine, nursing, and education, where pilot and feasibility studies have well-established definitions, guidelines, reporting standards and even a dedicated research journal, the CHI community lacks a shared understanding of what constitutes a pilot study, why they are conducted, and how they should be reported. Many papers reference pilots 'in passing', without details about design, outcomes, or how the pilot informed the main study. This variability suggests a methodological blind spot in our community.
Paper Structure (20 sections, 4 figures, 3 tables)

This paper contains 20 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Pilot studies reporting practices based on LLM annotations. Frequency of papers that present the pilot and main study described in the same section together (left). Reporting structure distribution in detail: most authors present it as a 'Dedicated Main Section', or 'Embedded in Method' (in-passing).
  • Figure 2: Depth of Findings and Result Reporting. Most papers provide 'Moderate' (e.g. a summary, main insights or takeaways without details) or 'Minimal' (e.g. brief mention without details or supporting data) descriptions.
  • Figure 3: Impact of the Pilot Study on Main Study. Task Design covers modification on Activities, Technical Implementation on system/prototype modifications, and Study Design embraces general design changes.
  • Figure 4: Document token length distribution. Token counts are computed using the gpt-4o-mini tokenizer.