Reporting and Reviewing LLM-Integrated Systems in HCI: Challenges and Considerations

Karla Felix Navarro; Eugene Syriani; Ian Arawjo

Reporting and Reviewing LLM-Integrated Systems in HCI: Challenges and Considerations

Karla Felix Navarro, Eugene Syriani, Ian Arawjo

TL;DR

The paper addresses the lack of reporting and reviewing standards for LLM-integrated systems in HCI amid rapid proliferation. It deploys a qualitative interview study with 18 authors and 6 expert researchers to reveal how LLM uncertainty and AI hype erode trust, trigger stringent reviewer demands, and fuel a clash between HCI and ML/NLP norms. The authors propose a nuanced set of guidelines and considerations for authors, reviewers, and venues, emphasizing selective prompt reporting, high-level architecture visualization, justification for LLM usage, and open practices where feasible. The work aims to inform venue guidelines, encourage exemplar papers, and foster a balanced, context-aware approach to evaluating LLM-enabled HCI research, ultimately improving transparency, reproducibility, and fairness in the field.

Abstract

What should HCI scholars consider when reporting and reviewing papers that involve LLM-integrated systems? We interview 18 authors of LLM-integrated system papers on their authoring and reviewing experiences. We find that norms of trust-building between authors and reviewers appear to be eroded by the uncertainty of LLM behavior and hyperbolic rhetoric surrounding AI. Authors perceive that reviewers apply uniquely skeptical and inconsistent standards towards papers that report LLM-integrated systems, and mitigate mistrust by adding technical evaluations, justifying usage, and de-emphasizing LLM presence. Authors' views challenge blanket directives to report all prompts and use open models, arguing that prompt reporting is context-dependent and justifying proprietary model usage despite ethical concerns. Finally, some tensions in peer review appear to stem from clashes between the norms and values of HCI and ML/NLP communities, particularly around what constitutes a contribution and an appropriate level of technical rigor. Based on our findings and additional feedback from six expert HCI researchers, we present a set of guidelines and considerations for authors, reviewers, and HCI communities around reporting and reviewing papers that involve LLM-integrated systems.

Reporting and Reviewing LLM-Integrated Systems in HCI: Challenges and Considerations

TL;DR

Abstract

Paper Structure (71 sections, 1 figure, 2 tables)

This paper contains 71 sections, 1 figure, 2 tables.

Introduction
Context and Related Work
Methodology
Data collection
Participant demographics and experience
Research objectives and interview protocol
Data analysis
Limitations
Findings
Authors designing LLM-integrated systems
Technology in search of a problem: LLM capabilities can bias problem selection and design
Open-source vs. Proprietary: Model selection mediated by cost, latency, and performance trade-offs.
Uncertainty of LLMs distinguishes them from traditional code-based systems
Authors' strategies to build confidence in their systems in the face of LLM uncertainty.
Stop when it is "good enough" for a user study
...and 56 more sections

Figures (1)

Figure 1: The number of papers with LLM-integrated systems published at CHI and UIST since 2021, both as absolute counts and the percentage share of all accepted papers.

Reporting and Reviewing LLM-Integrated Systems in HCI: Challenges and Considerations

TL;DR

Abstract

Reporting and Reviewing LLM-Integrated Systems in HCI: Challenges and Considerations

TL;DR

Abstract

Table of Contents

Figures (1)