Leveraging Large Language Models (LLMs) to Support Collaborative Human-AI Online Risk Data Annotation
Jinkyung Park, Pamela Wisniewski, Vivek Singh
TL;DR
The paper addresses the difficulty of annotating large-scale online risk data due to subjectivity and resource intensity. It proposes using Large Language Model–based Conversational Agents as interactive co-labelers to enable two-way, context-aware collaboration between humans and AI, rather than replacing human coders. The authors review evidence that LLMs can support classification and theme extraction while acknowledging limitations in highly contextualized risk constructs, and outline design considerations—interaction, context-awareness, prompts, consistency, UI, and privacy—for effective human-AI collaboration. This work aims to guide HCI researchers in developing tools that scale risk annotation while preserving nuance and ethical standards, with potential practical impact on online safety research and moderation practices.
Abstract
In this position paper, we discuss the potential for leveraging LLMs as interactive research tools to facilitate collaboration between human coders and AI to effectively annotate online risk data at scale. Collaborative human-AI labeling is a promising approach to annotating large-scale and complex data for various tasks. Yet, tools and methods to support effective human-AI collaboration for data annotation are under-studied. This gap is pertinent because co-labeling tasks need to support a two-way interactive discussion that can add nuance and context, particularly in the context of online risk, which is highly subjective and contextualized. Therefore, we provide some of the early benefits and challenges of using LLMs-based tools for risk annotation and suggest future directions for the HCI research community to leverage LLMs as research tools to facilitate human-AI collaboration in contextualized online data annotation. Our research interests align very well with the purposes of the LLMs as Research Tools workshop to identify ongoing applications and challenges of using LLMs to work with data in HCI research. We anticipate learning valuable insights from organizers and participants into how LLMs can help reshape the HCI community's methods for working with data.
