Table of Contents
Fetching ...

Reporting and Reviewing LLM-Integrated Systems in HCI: Challenges and Considerations

Karla Felix Navarro, Eugene Syriani, Ian Arawjo

TL;DR

The paper addresses the lack of reporting and reviewing standards for LLM-integrated systems in HCI amid rapid proliferation. It deploys a qualitative interview study with 18 authors and 6 expert researchers to reveal how LLM uncertainty and AI hype erode trust, trigger stringent reviewer demands, and fuel a clash between HCI and ML/NLP norms. The authors propose a nuanced set of guidelines and considerations for authors, reviewers, and venues, emphasizing selective prompt reporting, high-level architecture visualization, justification for LLM usage, and open practices where feasible. The work aims to inform venue guidelines, encourage exemplar papers, and foster a balanced, context-aware approach to evaluating LLM-enabled HCI research, ultimately improving transparency, reproducibility, and fairness in the field.

Abstract

What should HCI scholars consider when reporting and reviewing papers that involve LLM-integrated systems? We interview 18 authors of LLM-integrated system papers on their authoring and reviewing experiences. We find that norms of trust-building between authors and reviewers appear to be eroded by the uncertainty of LLM behavior and hyperbolic rhetoric surrounding AI. Authors perceive that reviewers apply uniquely skeptical and inconsistent standards towards papers that report LLM-integrated systems, and mitigate mistrust by adding technical evaluations, justifying usage, and de-emphasizing LLM presence. Authors' views challenge blanket directives to report all prompts and use open models, arguing that prompt reporting is context-dependent and justifying proprietary model usage despite ethical concerns. Finally, some tensions in peer review appear to stem from clashes between the norms and values of HCI and ML/NLP communities, particularly around what constitutes a contribution and an appropriate level of technical rigor. Based on our findings and additional feedback from six expert HCI researchers, we present a set of guidelines and considerations for authors, reviewers, and HCI communities around reporting and reviewing papers that involve LLM-integrated systems.

Reporting and Reviewing LLM-Integrated Systems in HCI: Challenges and Considerations

TL;DR

The paper addresses the lack of reporting and reviewing standards for LLM-integrated systems in HCI amid rapid proliferation. It deploys a qualitative interview study with 18 authors and 6 expert researchers to reveal how LLM uncertainty and AI hype erode trust, trigger stringent reviewer demands, and fuel a clash between HCI and ML/NLP norms. The authors propose a nuanced set of guidelines and considerations for authors, reviewers, and venues, emphasizing selective prompt reporting, high-level architecture visualization, justification for LLM usage, and open practices where feasible. The work aims to inform venue guidelines, encourage exemplar papers, and foster a balanced, context-aware approach to evaluating LLM-enabled HCI research, ultimately improving transparency, reproducibility, and fairness in the field.

Abstract

What should HCI scholars consider when reporting and reviewing papers that involve LLM-integrated systems? We interview 18 authors of LLM-integrated system papers on their authoring and reviewing experiences. We find that norms of trust-building between authors and reviewers appear to be eroded by the uncertainty of LLM behavior and hyperbolic rhetoric surrounding AI. Authors perceive that reviewers apply uniquely skeptical and inconsistent standards towards papers that report LLM-integrated systems, and mitigate mistrust by adding technical evaluations, justifying usage, and de-emphasizing LLM presence. Authors' views challenge blanket directives to report all prompts and use open models, arguing that prompt reporting is context-dependent and justifying proprietary model usage despite ethical concerns. Finally, some tensions in peer review appear to stem from clashes between the norms and values of HCI and ML/NLP communities, particularly around what constitutes a contribution and an appropriate level of technical rigor. Based on our findings and additional feedback from six expert HCI researchers, we present a set of guidelines and considerations for authors, reviewers, and HCI communities around reporting and reviewing papers that involve LLM-integrated systems.
Paper Structure (71 sections, 1 figure, 2 tables)