Table of Contents
Fetching ...

Dialogue as Discovery: Navigating Human Intent Through Principled Inquiry

Jianwen Sun, Yukang Feng, Yifan Chang, Chuanhao Li, Zizhen Li, Jiaxin Ai, Fanrui Zhang, Yu Dai, Kaipeng Zhang

TL;DR

This work tackles the intention-expression gap in human–AI collaboration by recasting interaction as a Socratic inquiry task. It introduces Nous, an agent that actively asks questions to progressively resolve user intent, guided by an intrinsic reward defined as information gain, equivalently the entropy reduction over a structured diagram-specification space. The methodology combines a formal information-theoretic framework, an offline GRPO training pipeline with automatically generated preference data, and a domain-agnostic approach validated on scientific diagram generation with robust performance across varying user expertise. The results show that entropy-based rewards drive efficient, high-quality inquiries and that offline GRPO provides a scalable, stable training path, with evidence of generalization beyond diagram tasks. The work advances principled, scalable human–AI collaboration by shifting the burden of clarification from humans to a proactive, information-seeking AI partner.

Abstract

A fundamental bottleneck in human-AI collaboration is the "intention expression gap," the difficulty for humans to effectively convey complex, high-dimensional thoughts to AI. This challenge often traps users in inefficient trial-and-error loops and is exacerbated by the diverse expertise levels of users. We reframe this problem from passive instruction following to a Socratic collaboration paradigm, proposing an agent that actively probes for information to resolve its uncertainty about user intent. we name the proposed agent Nous, trained to acquire proficiency in this inquiry policy. The core mechanism of Nous is a training framework grounded in the first principles of information theory. Within this framework, we define the information gain from dialogue as an intrinsic reward signal, which is fundamentally equivalent to the reduction of Shannon entropy over a structured task space. This reward design enables us to avoid reliance on costly human preference annotations or external reward models. To validate our framework, we develop an automated simulation pipeline to generate a large-scale, preference-based dataset for the challenging task of scientific diagram generation. Comprehensive experiments, including ablations, subjective and objective evaluations, and tests across user expertise levels, demonstrate the effectiveness of our proposed framework. Nous achieves leading efficiency and output quality, while remaining robust to varying user expertise. Moreover, its design is domain-agnostic, and we show evidence of generalization beyond diagram generation. Experimental results prove that our work offers a principled, scalable, and adaptive paradigm for resolving uncertainty about user intent in complex human-AI collaboration.

Dialogue as Discovery: Navigating Human Intent Through Principled Inquiry

TL;DR

This work tackles the intention-expression gap in human–AI collaboration by recasting interaction as a Socratic inquiry task. It introduces Nous, an agent that actively asks questions to progressively resolve user intent, guided by an intrinsic reward defined as information gain, equivalently the entropy reduction over a structured diagram-specification space. The methodology combines a formal information-theoretic framework, an offline GRPO training pipeline with automatically generated preference data, and a domain-agnostic approach validated on scientific diagram generation with robust performance across varying user expertise. The results show that entropy-based rewards drive efficient, high-quality inquiries and that offline GRPO provides a scalable, stable training path, with evidence of generalization beyond diagram tasks. The work advances principled, scalable human–AI collaboration by shifting the burden of clarification from humans to a proactive, information-seeking AI partner.

Abstract

A fundamental bottleneck in human-AI collaboration is the "intention expression gap," the difficulty for humans to effectively convey complex, high-dimensional thoughts to AI. This challenge often traps users in inefficient trial-and-error loops and is exacerbated by the diverse expertise levels of users. We reframe this problem from passive instruction following to a Socratic collaboration paradigm, proposing an agent that actively probes for information to resolve its uncertainty about user intent. we name the proposed agent Nous, trained to acquire proficiency in this inquiry policy. The core mechanism of Nous is a training framework grounded in the first principles of information theory. Within this framework, we define the information gain from dialogue as an intrinsic reward signal, which is fundamentally equivalent to the reduction of Shannon entropy over a structured task space. This reward design enables us to avoid reliance on costly human preference annotations or external reward models. To validate our framework, we develop an automated simulation pipeline to generate a large-scale, preference-based dataset for the challenging task of scientific diagram generation. Comprehensive experiments, including ablations, subjective and objective evaluations, and tests across user expertise levels, demonstrate the effectiveness of our proposed framework. Nous achieves leading efficiency and output quality, while remaining robust to varying user expertise. Moreover, its design is domain-agnostic, and we show evidence of generalization beyond diagram generation. Experimental results prove that our work offers a principled, scalable, and adaptive paradigm for resolving uncertainty about user intent in complex human-AI collaboration.

Paper Structure

This paper contains 47 sections, 9 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: The multi-stage curation pipeline for the dataset and the details of model training. We began with a raw dataset of approximately 1 million figures downloaded from scientific papers in different fields on arXiv and PMC. This dataset was first filtered using the CLIP model to remove data plots (such as bar charts and line graphs), resulting in 29,000 images. Next, we used the Qwen-2.5-VL-72B model to retain true schematic diagrams, reducing the dataset to 8,000 images. Finally, three PhD students conducted a manual review to ensure the relevance, clarity, and quality of each figure, resulting in a final dataset of 1,100 images. From this curated dataset, 1,000 figures were used to build the world model and train simulations, while 100 figures were set aside for testing. Detailed explanations regarding data distribution and open-source licenses are provided in Appendix .
  • Figure 2: Experimental results of Interaction Efficiency. (a) The average number of dialogue turns for each model to complete information collection; (b) The average information gain obtained during the dialogue for each model; (c) The dynamic change of information gain during the dialogue
  • Figure 3: Model scores under different tie-handling protocols. (a) Results of human evaluation; (b) Results of GPT-5 model evaluation.
  • Figure 4: Visualization of experimental results. (a) Evaluation results of each model; (b) Results of ablation experiment 1; (c) Results of ablation experiment 2.
  • Figure 5: This section presents drawing examples generated using the VisPainter framework
  • ...and 1 more figures