Connecting the Dots: Surfacing Structure in Documents through AI-Generated Cross-Modal Links
Alyssa Hwang, Hita Kambhamettu, Yue Yang, Ajay Patel, Joseph Chee Chang, Andrew Head
TL;DR
The paper tackles the cognitive difficulty of understanding dense, multimodal documents by proposing a general framework for fine-grained integration of information across text and visuals. It defines two primitives, entities and links, and instantiates them in an augmented reading interface featuring figure points, highlighted phrases, a persistent reference panel, and a visual index. Through formative and comparative user studies, the approach yields statistically significant improvements in reading quiz performance without increasing time or cognitive load, while highlighting user preferences for cross-modal linking components. The work demonstrates the potential of treating complex documents as networks of localized details that can be surfaced and navigated across modalities, with implications for scalable comprehension of scientific literature.
Abstract
Understanding information-dense documents like recipes and scientific papers requires readers to find, interpret, and connect details scattered across text, figures, tables, and other visual elements. These documents are often long and filled with specialized terminology, hindering the ability to locate relevant information or piece together related ideas. Existing tools offer limited support for synthesizing information across media types. As a result, understanding complex material remains cognitively demanding. This paper presents a framework for fine-grained integration of information in complex documents. We instantiate the framework in an augmented reading interface, which populates a scientific paper with clickable points on figures, interactive highlights in the body text, and a persistent reference panel for accessing consolidated details without manual scrolling. In a controlled between-subjects study, we find that participants who read the paper with our tool achieved significantly higher scores on a reading quiz without evidence of increased time to completion or cognitive load. Fine-grained integration provides a systematic way of revealing relationships within a document, supporting engagement with complex, information-dense materials.
