CollabStory: Multi-LLM Collaborative Story Generation and Authorship Analysis
Saranya Venkatraman, Nafis Irtiza Tripto, Dongwon Lee
TL;DR
CollabStory introduces the first exclusively LLM-generated multi-LLM collaborative story dataset, enabling analysis of machine-machine writing across up to five authors. The authors develop a systematic data-generation pipeline using five open-source instruction-tuned LLMs and prompt templates that promote sequential, baton-style story writing, yielding 32,503 stories and enabling in-depth continuity and authorship analyses. By adapting PAN tasks to a multi-LLM setting and benchmarking five classical baselines, the paper demonstrates which authorship analyses are tractable (authorship verification and multi-authorship detection) and which remain challenging (exact author attribution and exact author count). The dataset, prompting framework, and baseline insights offer a resource for developing new methods to detect multi-LLM authorship, support credit attribution, and address IP concerns in automated writing contexts with significant practical implications for education, publishing, and misinformation mitigation.
Abstract
The rise of unifying frameworks that enable seamless interoperability of Large Language Models (LLMs) has made LLM-LLM collaboration for open-ended tasks a possibility. Despite this, there have not been efforts to explore such collaborative writing. We take the next step beyond human-LLM collaboration to explore this multi-LLM scenario by generating the first exclusively LLM-generated collaborative stories dataset called CollabStory. We focus on single-author to multi-author (up to 5 LLMs) scenarios, where multiple LLMs co-author stories. We generate over 32k stories using open-source instruction-tuned LLMs. Further, we take inspiration from the PAN tasks that have set the standard for human-human multi-author writing tasks and analysis. We extend their authorship-related tasks for multi-LLM settings and present baselines for LLM-LLM collaboration. We find that current baselines are not able to handle this emerging scenario. Thus, CollabStory is a resource that could help propel an understanding as well as the development of new techniques to discern the use of multiple LLMs. This is crucial to study in the context of writing tasks since LLM-LLM collaboration could potentially overwhelm ongoing challenges related to plagiarism detection, credit assignment, maintaining academic integrity in educational settings, and addressing copyright infringement concerns. We make our dataset and code available at https://github.com/saranya-venkatraman/CollabStory.
