Table of Contents
Fetching ...

Key Considerations for Domain Expert Involvement in LLM Design and Evaluation: An Ethnographic Study

Annalisa Szymanski, Oghenemaro Anuyah, Toby Jia-Jun Li, Ronald A. Metoyer

TL;DR

Examination of the challenges and trade-offs in LLM development through a 12-week ethnographic study of a team building a pedagogical chatbot revealed four key practices that show how teams made strategic decisions under constraints and demonstrate the central role of domain expertise in shaping the system.

Abstract

Large Language Models (LLMs) are increasingly developed for use in complex professional domains, yet little is known about how teams design and evaluate these systems in practice. This paper examines the challenges and trade-offs in LLM development through a 12-week ethnographic study of a team building a pedagogical chatbot. The researcher observed design and evaluation activities and conducted interviews with both developers and domain experts. Analysis revealed four key practices: creating workarounds for data collection, turning to augmentation when expert input was limited, co-developing evaluation criteria with experts, and adopting hybrid expert-developer-LLM evaluation strategies. These practices show how teams made strategic decisions under constraints and demonstrate the central role of domain expertise in shaping the system. Challenges included expert motivation and trust, difficulties structuring participatory design, and questions around ownership and integration of expert knowledge. We propose design opportunities for future LLM development workflows that emphasize AI literacy, transparent consent, and frameworks recognizing evolving expert roles.

Key Considerations for Domain Expert Involvement in LLM Design and Evaluation: An Ethnographic Study

TL;DR

Examination of the challenges and trade-offs in LLM development through a 12-week ethnographic study of a team building a pedagogical chatbot revealed four key practices that show how teams made strategic decisions under constraints and demonstrate the central role of domain expertise in shaping the system.

Abstract

Large Language Models (LLMs) are increasingly developed for use in complex professional domains, yet little is known about how teams design and evaluate these systems in practice. This paper examines the challenges and trade-offs in LLM development through a 12-week ethnographic study of a team building a pedagogical chatbot. The researcher observed design and evaluation activities and conducted interviews with both developers and domain experts. Analysis revealed four key practices: creating workarounds for data collection, turning to augmentation when expert input was limited, co-developing evaluation criteria with experts, and adopting hybrid expert-developer-LLM evaluation strategies. These practices show how teams made strategic decisions under constraints and demonstrate the central role of domain expertise in shaping the system. Challenges included expert motivation and trust, difficulties structuring participatory design, and questions around ownership and integration of expert knowledge. We propose design opportunities for future LLM development workflows that emphasize AI literacy, transparent consent, and frameworks recognizing evolving expert roles.
Paper Structure (39 sections, 1 figure)

This paper contains 39 sections, 1 figure.

Figures (1)

  • Figure 1: Stakeholder structure and roles in the InstructAI project. The LLM development team coordinated design, implemented the system, and oversaw model development and evaluation. The team worked in collaboration with a primary stakeholder from the center of teaching and learning, who set the project vision and requirements, and with pedagogical domain experts, who contributed expertise during design and evaluation activities.