Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations
Yucheng Jiang, Yijia Shao, Dekun Ma, Sina J. Semnani, Monica S. Lam
TL;DR
This work introduces Co-STORM, a collaborative discourse framework where a user-guided roundtable of LM agents—plus a moderator—participates in information seeking to surface unknown unknowns. A dynamic mind map tracks discourse and organizes retrieved content into a curated, cited report, enabling serendipitous discovery beyond one-shot QA. The authors construct WildSeek, a real-world dataset of topic-goal pairs to evaluate complex information seeking, and demonstrate that Co-STORM outperforms RAG chatbots and STORM-based baselines in both automatic and human evaluations, with users preferring it for deeper, more engaging exploration. Limitations include user-tailored knowledge pacing, discourse customization, multilingual support, and latency, suggesting avenues for future enhancements and broader applicability in learning and decision-making tasks.
Abstract
While language model (LM)-powered chatbots and generative search engines excel at answering concrete queries, discovering information in the terrain of unknown unknowns remains challenging for users. To emulate the common educational scenario where children/students learn by listening to and participating in conversations of their parents/teachers, we create Collaborative STORM (Co-STORM). Unlike QA systems that require users to ask all the questions, Co-STORM lets users observe and occasionally steer the discourse among several LM agents. The agents ask questions on the user's behalf, allowing the user to discover unknown unknowns serendipitously. To facilitate user interaction, Co-STORM assists users in tracking the discourse by organizing the uncovered information into a dynamic mind map, ultimately generating a comprehensive report as takeaways. For automatic evaluation, we construct the WildSeek dataset by collecting real information-seeking records with user goals. Co-STORM outperforms baseline methods on both discourse trace and report quality. In a further human evaluation, 70% of participants prefer Co-STORM over a search engine, and 78% favor it over a RAG chatbot.
