Table of Contents
Fetching ...

PaperVoyager : Building Interactive Web with Visual Language Models

Dasen Dai, Biao Wu, Meng Fang, Wenhao Wang

Abstract

Recent advances in visual language models have enabled autonomous agents for complex reasoning, tool use, and document understanding. However, existing document agents mainly transform papers into static artifacts such as summaries, webpages, or slides, which are insufficient for technical papers involving dynamic mechanisms and state transitions. In this work, we propose a Paper-to-Interactive-System Agent that converts research papers into executable interactive web systems. Given a PDF paper, the agent performs end-to-end processing without human intervention, including paper understanding, system modeling, and interactive webpage synthesis, enabling users to manipulate inputs and observe dynamic behaviors. To evaluate this task, we introduce a benchmark of 19 research papers paired with expert-built interactive systems as ground truth. We further propose PaperVoyager, a structured generation framework that explicitly models mechanisms and interaction logic during synthesis. Experiments show that PaperVoyager significantly improves the quality of generated interactive systems, offering a new paradigm for interactive scientific paper understanding.

PaperVoyager : Building Interactive Web with Visual Language Models

Abstract

Recent advances in visual language models have enabled autonomous agents for complex reasoning, tool use, and document understanding. However, existing document agents mainly transform papers into static artifacts such as summaries, webpages, or slides, which are insufficient for technical papers involving dynamic mechanisms and state transitions. In this work, we propose a Paper-to-Interactive-System Agent that converts research papers into executable interactive web systems. Given a PDF paper, the agent performs end-to-end processing without human intervention, including paper understanding, system modeling, and interactive webpage synthesis, enabling users to manipulate inputs and observe dynamic behaviors. To evaluate this task, we introduce a benchmark of 19 research papers paired with expert-built interactive systems as ground truth. We further propose PaperVoyager, a structured generation framework that explicitly models mechanisms and interaction logic during synthesis. Experiments show that PaperVoyager significantly improves the quality of generated interactive systems, offering a new paradigm for interactive scientific paper understanding.
Paper Structure (29 sections, 7 figures, 6 tables)

This paper contains 29 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Traditional Reading vs. Interactive Exploration.Left: conventional paper comprehension relies on passive reading and mental simulation. Right: our approach enables interactive exploration, where users manipulate controls in an executable interface and observe state changes to better understand the paper’s key mechanisms and dynamics.
  • Figure 2: Overview of the PaperVoyager pipeline. Starting from a static PDF paper, the agent performs multimodal document parsing, identifies core mechanisms suitable for interaction, designs a structured generation specification, and synthesizes code via an LLM to produce an executable WebPaper.
  • Figure 3: Prompt used by PaperVoyager to generate interactive web systems.
  • Figure 4: Overview of the benchmark construction pipeline.
  • Figure 5: Case study on ML Gradient Descent. Row 1 shows the single-shot baseline with Gemini-3-Pro, and Row 2 shows PaperVoyager.
  • ...and 2 more figures