Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations

Yucheng Jiang; Yijia Shao; Dekun Ma; Sina J. Semnani; Monica S. Lam

Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations

Yucheng Jiang, Yijia Shao, Dekun Ma, Sina J. Semnani, Monica S. Lam

TL;DR

This work introduces Co-STORM, a collaborative discourse framework where a user-guided roundtable of LM agents—plus a moderator—participates in information seeking to surface unknown unknowns. A dynamic mind map tracks discourse and organizes retrieved content into a curated, cited report, enabling serendipitous discovery beyond one-shot QA. The authors construct WildSeek, a real-world dataset of topic-goal pairs to evaluate complex information seeking, and demonstrate that Co-STORM outperforms RAG chatbots and STORM-based baselines in both automatic and human evaluations, with users preferring it for deeper, more engaging exploration. Limitations include user-tailored knowledge pacing, discourse customization, multilingual support, and latency, suggesting avenues for future enhancements and broader applicability in learning and decision-making tasks.

Abstract

While language model (LM)-powered chatbots and generative search engines excel at answering concrete queries, discovering information in the terrain of unknown unknowns remains challenging for users. To emulate the common educational scenario where children/students learn by listening to and participating in conversations of their parents/teachers, we create Collaborative STORM (Co-STORM). Unlike QA systems that require users to ask all the questions, Co-STORM lets users observe and occasionally steer the discourse among several LM agents. The agents ask questions on the user's behalf, allowing the user to discover unknown unknowns serendipitously. To facilitate user interaction, Co-STORM assists users in tracking the discourse by organizing the uncovered information into a dynamic mind map, ultimately generating a comprehensive report as takeaways. For automatic evaluation, we construct the WildSeek dataset by collecting real information-seeking records with user goals. Co-STORM outperforms baseline methods on both discourse trace and report quality. In a further human evaluation, 70% of participants prefer Co-STORM over a search engine, and 78% favor it over a RAG chatbot.

Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations

TL;DR

Abstract

Paper Structure (29 sections, 2 equations, 17 figures, 11 tables)

This paper contains 29 sections, 2 equations, 17 figures, 11 tables.

Introduction
Complex Information Seeking
Problem Formulation
: An In-the-Wild Information Seeking Dataset
Method
Collaborative Discourse Protocol
Tracking the Discourse with a Mind Map
User Participation
Simulating the Roundtable Participant
Simulating the Moderator
Implementation
Automatic Evaluation
Evaluation Setup
Automatic Metrics
Automatic Evaluation Results
...and 14 more sections

Figures (17)

Figure 1: Comparison of different paradigms for learning and information seeking. enables humans to observe and participate in a collaborative discourse among LM agents with different roles. Users can request the system to generate a full-length cited report based on the discourse history and the information collected.
Figure 2: Overview of . emulates a collaborative discourse among the user, simulated perspective-guided , and a simulated . It maintains a dynamically updated mind map (sec:mind_map) to help user track and engage in the discourse (sec:human_steering_mode) . The simulated is prompted to determine the utterance intent based on discourse history and generate a question or an answer grounded in the Internet (sec:simulate_participant). The simulated is prompted with unused information and the mind map to generate a new question to automatically steer the discourse (sec:simulate_moderator). The mind map can be used to generate a full-length cited report as takeaways. Complete discourse transcript and the associated report are detailed in Appendix sec:alphafold_discourse and sec:alphafold_report.
Figure 3: Rubric grading results for question-asking turn quality in automatic evaluation with simulated users.
Figure 4: Survey results of the pairwise comparison (, agreement on whether is better than Search Engine/RAG Chatbot) in human evaluation.
Figure 5: taxonomy. The number in the parenthesis denotes the number of data points classified under the corresponding category or its descendants.
...and 12 more figures

Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations

TL;DR

Abstract

Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations

Authors

TL;DR

Abstract

Table of Contents

Figures (17)