Table of Contents
Fetching ...

LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs

Tongshuang Wu, Haiyi Zhu, Maya Albayrak, Alexis Axon, Amanda Bertsch, Wenxing Deng, Ziqi Ding, Bill Guo, Sireesh Gururaja, Tzu-Sheng Kuo, Jenny T. Liang, Ryan Liu, Ihita Mandal, Jeremiah Milbauer, Xiaolin Ni, Namrata Padmanabhan, Subhashini Ramkumar, Alexis Sudjianto, Jordan Taylor, Ying-Jui Tseng, Patricia Vaidos, Zhijin Wu, Wei Wu, Chenyang Yang

TL;DR

The paper investigates whether modern LLMs can replicate complex crowdsourcing pipelines, not just simple tasks, via a course-based study where 20 students attempt to emulate labeled pipelines using LLMs. It compares Baseline prompts (single-model tasks) with LLM-chained replications across multiple pipelines, evaluating replication correctness and chain effectiveness. Findings show LLMs achieve partial success with considerable variability driven by instruction quality, prompt design, and non-determinism, revealing strengths and gaps in information foraging and sensitivity to comparison-based prompts. The work emphasizes using LLMs to study task decomposition, suggests combining human and machine workers for complementary sub-tasks, and highlights the need for guardrails and multimodal alignment to realize practical, reliable human–LLM collaborations in complex pipelines.

Abstract

LLMs have shown promise in replicating human-like behavior in crowdsourcing tasks that were previously thought to be exclusive to human abilities. However, current efforts focus mainly on simple atomic tasks. We explore whether LLMs can replicate more complex crowdsourcing pipelines. We find that modern LLMs can simulate some of crowdworkers' abilities in these ``human computation algorithms,'' but the level of success is variable and influenced by requesters' understanding of LLM capabilities, the specific skills required for sub-tasks, and the optimal interaction modality for performing these sub-tasks. We reflect on human and LLMs' different sensitivities to instructions, stress the importance of enabling human-facing safeguards for LLMs, and discuss the potential of training humans and LLMs with complementary skill sets. Crucially, we show that replicating crowdsourcing pipelines offers a valuable platform to investigate 1) the relative LLM strengths on different tasks (by cross-comparing their performances on sub-tasks) and 2) LLMs' potential in complex tasks, where they can complete part of the tasks while leaving others to humans.

LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs

TL;DR

The paper investigates whether modern LLMs can replicate complex crowdsourcing pipelines, not just simple tasks, via a course-based study where 20 students attempt to emulate labeled pipelines using LLMs. It compares Baseline prompts (single-model tasks) with LLM-chained replications across multiple pipelines, evaluating replication correctness and chain effectiveness. Findings show LLMs achieve partial success with considerable variability driven by instruction quality, prompt design, and non-determinism, revealing strengths and gaps in information foraging and sensitivity to comparison-based prompts. The work emphasizes using LLMs to study task decomposition, suggests combining human and machine workers for complementary sub-tasks, and highlights the need for guardrails and multimodal alignment to realize practical, reliable human–LLM collaborations in complex pipelines.

Abstract

LLMs have shown promise in replicating human-like behavior in crowdsourcing tasks that were previously thought to be exclusive to human abilities. However, current efforts focus mainly on simple atomic tasks. We explore whether LLMs can replicate more complex crowdsourcing pipelines. We find that modern LLMs can simulate some of crowdworkers' abilities in these ``human computation algorithms,'' but the level of success is variable and influenced by requesters' understanding of LLM capabilities, the specific skills required for sub-tasks, and the optimal interaction modality for performing these sub-tasks. We reflect on human and LLMs' different sensitivities to instructions, stress the importance of enabling human-facing safeguards for LLMs, and discuss the potential of training humans and LLMs with complementary skill sets. Crucially, we show that replicating crowdsourcing pipelines offers a valuable platform to investigate 1) the relative LLM strengths on different tasks (by cross-comparing their performances on sub-tasks) and 2) LLMs' potential in complex tasks, where they can complete part of the tasks while leaving others to humans.
Paper Structure (13 sections, 2 figures, 1 table)

This paper contains 13 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: We study whether LLMs can be used to replicate crowdsourcing pipelines and replace human workers in certain advanced "human-computational process."
  • Figure 2: The original pipeline and the LLM replications for (A) Iterative Processlittle2010exploring and (B) Find-Fix-Verifybernstein2010soylent. While only P11 diverged from the original Iterative Process by adding a condition about how previous results should be ranked and used in subsequent steps, students replicating Find-Fix-Verify all had different Verify steps (marked in red box). The chains are slightly simplified for readability.