LLMCode: Evaluating and Enhancing Researcher-AI Alignment in Qualitative Analysis
Joel Oksanen, Andrés Lucero, Perttu Hämäläinen
TL;DR
This paper tackles the challenge of aligning large language models with the nuanced, reflexive insights central to research for design (RfD). It introduces LLMCode, an open-source toolkit that uses two metrics, Intersection over Union ($IoU$) and Modified Hausdorff Distance ($MHD$), to quantify how closely AI-generated coding matches human coding and how semantically aligned the codes are. Across two studies with 26 designers, the authors show LLMs can match deductive coding patterns but struggle to emulate deeper interpretive reasoning, highlighting the need for ongoing human oversight and iterative collaboration. The work advances the field by providing a concrete evaluation framework and an interactive interface that helps researchers manage AI-assisted qualitative coding while preserving interpretive depth, thereby informing the design of more trustworthy researcher-AI tools in qualitative inquiry.
Abstract
The use of large language models (LLMs) in qualitative analysis offers enhanced efficiency but raises questions about their alignment with the contextual nature of research for design (RfD). This research examines the trustworthiness of LLM-driven design insights, using qualitative coding as a case study to explore the interpretive processes central to RfD. We introduce LLMCode, an open-source tool integrating two metrics, namely Intersection over Union (IoU) and Modified Hausdorff Distance, to assess the alignment between human and LLM-generated insights. Across two studies involving 26 designers, we find that while the model performs well with deductive coding, its ability to emulate a designer's deeper interpretive lens over the data is limited, emphasising the importance of human-AI collaboration. Our results highlight a reciprocal dynamic where users refine LLM outputs and adapt their own perspectives based on the model's suggestions. These findings underscore the importance of fostering appropriate reliance on LLMs by designing tools that preserve interpretive depth while facilitating intuitive collaboration between designers and AI.
