Table of Contents
Fetching ...

ViviDoc: Generating Interactive Documents through Human-Agent Collaboration

Yinghao Tang, Yupeng Xie, Yingchaojie Feng, Tingfeng Lan, Jiale Lao, Yue Cheng, Wei Chen

Abstract

Interactive documents help readers engage with complex ideas through dynamic visualization, interactive animations, and exploratory interfaces. However, creating such documents remains costly, as it requires both domain expertise and web development skills. Recent Large Language Model (LLM)-based agents can automate content creation, but directly applying them to interactive document generation often produces outputs that are difficult to control. To address this, we present ViviDoc, to the best of our knowledge the first work to systematically address interactive document generation. ViviDoc introduces a multi-agent pipeline (Planner, Styler, Executor, Evaluator). To make the generation process controllable, we provide three levels of human control: (1) the Document Specification (DocSpec) with SRTC Interaction Specifications (State, Render, Transition, Constraint) for structured planning, (2) a content-aware Style Palette for customizing writing and interaction styles, and (3) chat-based editing for iterative refinement. We also construct ViviBench, a benchmark of 101 topics derived from real-world interactive documents across 11 domains, along with a taxonomy of 8 interaction types and a 4-dimensional automated evaluation framework validated against human ratings (Pearson r > 0.84). Experiments show that ViviDoc achieves the highest content richness and interaction quality in both automated and human evaluation. A 12-person user study confirms that the system is easy to use, provides effective control over the generation process, and produces documents that satisfy users.

ViviDoc: Generating Interactive Documents through Human-Agent Collaboration

Abstract

Interactive documents help readers engage with complex ideas through dynamic visualization, interactive animations, and exploratory interfaces. However, creating such documents remains costly, as it requires both domain expertise and web development skills. Recent Large Language Model (LLM)-based agents can automate content creation, but directly applying them to interactive document generation often produces outputs that are difficult to control. To address this, we present ViviDoc, to the best of our knowledge the first work to systematically address interactive document generation. ViviDoc introduces a multi-agent pipeline (Planner, Styler, Executor, Evaluator). To make the generation process controllable, we provide three levels of human control: (1) the Document Specification (DocSpec) with SRTC Interaction Specifications (State, Render, Transition, Constraint) for structured planning, (2) a content-aware Style Palette for customizing writing and interaction styles, and (3) chat-based editing for iterative refinement. We also construct ViviBench, a benchmark of 101 topics derived from real-world interactive documents across 11 domains, along with a taxonomy of 8 interaction types and a 4-dimensional automated evaluation framework validated against human ratings (Pearson r > 0.84). Experiments show that ViviDoc achieves the highest content richness and interaction quality in both automated and human evaluation. A 12-person user study confirms that the system is easy to use, provides effective control over the generation process, and produces documents that satisfy users.

Paper Structure

This paper contains 43 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Eight interactive visualization examples generated by ViviDoc, covering all 8 interaction types in our taxonomy, with styles automatically adapted to each topic’s content. (a) Parameter Exploration: sliders adjust flow rate and elevation parameters in real time. (b) Direct Manipulation: drag the object or focal points to update the lens equation live. (c) Inspection: hover to reveal a Voronoi cell and nearest-neighbor envelope. (d) Freeform Construction: click to place neurons and trigger animated signal propagation. (e) Scroll-driven Narrative: scroll to mix two particle gases and watch entropy rise. (f) Spatial Navigation: drag to rotate a 3D Möbius strip freely in space. (g) State Switching: switch quantum orbital states to redraw the electron probability cloud. (h) Temporal Control: play/pause and tune harmonics to build a Fourier series.
  • Figure 2: The ViviDoc pipeline. Given a topic, the Planner generates a DocSpec consisting of knowledge units with text descriptions and SRTC Interaction Specifications. The Styler analyzes the DocSpec and generates a Style Palette for users to customize writing and interaction styles. The Executor generates the document code guided by the DocSpec and style instructions. The Evaluator checks the output for correctness. Users can intervene at three points: editing the DocSpec, customizing the Style Palette, and refining the document through chat.
  • Figure 3: The ViviDoc user interface. Top: the main view with (A) sidebar for history and new document creation, (B) center panel with topic input and four-stage navigation bar, and (C) AI chat assistant. Bottom (left to right): the Spec stage showing editable knowledge units, the Style stage with writing and interaction style options, and the Doc stage displaying the generated interactive document.
  • Figure 4: Automated evaluation results for ViviDoc vs. three multi-agent baselines across three backbone LLMs. Content Richness (CR) and Interaction Quality (IQ) are normalized to $[0,1]$. Interaction Functionality is on a 0--1 scale; Efficiency is measured in characters per second (chars/s).