Table of Contents
Fetching ...

Mapping the Challenges of HCI: An Application and Evaluation of ChatGPT and GPT-4 for Mining Insights at Scale

Jonas Oppenlaender, Joonas Hämäläinen

TL;DR

The paper tackles extracting current research challenges in HCI by applying a two-step workflow that combines ChatGPT for broad challenge extraction and GPT-4 for selective filtering, using the CHI 2023 proceedings as a large real-world corpus. It demonstrates that this approach can identify 4,392 challenges across 113 topics and visualize them interactively, achieving cost-efficient, scalable insight mining. Through end-to-end evaluation, including quantitative metrics and human judgment, the authors show the method yields plausible, useful challenges aligned with human expert evaluation, while acknowledging limitations such as potential prompt sensitivity and alignment with broader sustainability goals. The work highlights the transformative potential of LLM-powered insight mining for academia and practice, suggests pathways for integrating LLMs into qualitative research, and provides open data and visualization resources to support further exploration in HCI and related fields.

Abstract

Large language models (LLMs), such as ChatGPT and GPT-4, are gaining wide-spread real world use. Yet, these LLMs are closed source, and little is known about their performance in real-world use cases. In this paper, we apply and evaluate the combination of ChatGPT and GPT-4 for the real-world task of mining insights from a text corpus in order to identify research challenges in the field of HCI. We extract 4,392 research challenges in over 100 topics from the 2023~CHI conference proceedings and visualize the research challenges for interactive exploration. We critically evaluate the LLMs on this practical task and conclude that the combination of ChatGPT and GPT-4 makes an excellent cost-efficient means for analyzing a text corpus at scale. Cost-efficiency is key for flexibly prototyping research ideas and analyzing text corpora from different perspectives, with implications for applying LLMs for mining insights in academia and practice.

Mapping the Challenges of HCI: An Application and Evaluation of ChatGPT and GPT-4 for Mining Insights at Scale

TL;DR

The paper tackles extracting current research challenges in HCI by applying a two-step workflow that combines ChatGPT for broad challenge extraction and GPT-4 for selective filtering, using the CHI 2023 proceedings as a large real-world corpus. It demonstrates that this approach can identify 4,392 challenges across 113 topics and visualize them interactively, achieving cost-efficient, scalable insight mining. Through end-to-end evaluation, including quantitative metrics and human judgment, the authors show the method yields plausible, useful challenges aligned with human expert evaluation, while acknowledging limitations such as potential prompt sensitivity and alignment with broader sustainability goals. The work highlights the transformative potential of LLM-powered insight mining for academia and practice, suggests pathways for integrating LLMs into qualitative research, and provides open data and visualization resources to support further exploration in HCI and related fields.

Abstract

Large language models (LLMs), such as ChatGPT and GPT-4, are gaining wide-spread real world use. Yet, these LLMs are closed source, and little is known about their performance in real-world use cases. In this paper, we apply and evaluate the combination of ChatGPT and GPT-4 for the real-world task of mining insights from a text corpus in order to identify research challenges in the field of HCI. We extract 4,392 research challenges in over 100 topics from the 2023~CHI conference proceedings and visualize the research challenges for interactive exploration. We critically evaluate the LLMs on this practical task and conclude that the combination of ChatGPT and GPT-4 makes an excellent cost-efficient means for analyzing a text corpus at scale. Cost-efficiency is key for flexibly prototyping research ideas and analyzing text corpora from different perspectives, with implications for applying LLMs for mining insights in academia and practice.
Paper Structure (38 sections, 5 figures, 4 tables)

This paper contains 38 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Histogram of pages per research paper in the CHI 2023 proceedings. Blue: total number of pages per paper, including references and appendices. Orange: approximate number of content pages per paper.
  • Figure 2: Research challenges extracted from CHI 2023 proceedings using ChatGPT and GPT-4, with topic annotations by BERTtopic grootendorst2022bertopic. To create the plot, the challenges were first converted into embeddings and then projected into two-dimensional space with UMAP UMAP. Each filled circle represents a research challenge, with colors indicating different topic clusters identified by BERTopic. Gray, unfilled circles represent challenges that did not group into any of the identified clusters. An interactive visualization is available at https://hci-research-challenges.github.io.
  • Figure 3: Intertopic distance map with 21 research themes (T1--T21). Each circle on the map represents a topic identified by BERTopic in the CHI 2023 proceedings. The size of the topics reflects the amount of research challenges associated with that topic. Topics that are close to each other can be considered to have similar content or themes, while topics far apart are more distinct or less related. Topics that are similar to each other are placed closer together, while dissimilar topics are farther apart. A topic cluster forms a research theme within HCI. We represent the research themes with selected salient keywords occurring in the topics that make up the theme.
  • Figure 4: Comparison of extracted research challenges with HCI's grand challenges and the UN's (modified) SDGs.
  • Figure 5: Cosine distances $d$ between the embedded statements extracted by ChatGPT in Step 1 and the best-matching sentence embeddings from the source text. The distribution is long-tailed, with many statements being very similar to statements in the source text, and few not having a direct match. Note that not having a direct match does not mean the statement is not a valid research challenge, since ChatGPT may have paraphrased or summarized the source text.