Table of Contents
Fetching ...

Doc To The Future: Infomorphs for Interactive, Multimodal Document Transformation and Generation

Balasaravanan Thoravi Kumaravel

TL;DR

This work introduces the notion of infomorphs, which are modular, user-steerable, AI-augmented transformations that support controlled synthesis, and restructuring of information across formats and modalities that support flexible, interactive, and multimodal document creation by combining Generative AI techniques with user intent and desired information context.

Abstract

Creating new documents by synthesizing information from existing sources is an important part of knowledge work in many domains. This process often involves gathering content from multiple documents, organizing it, and then transforming it into new forms such as reports, slides, or spreadsheets. While recent advances in Generative AI have shown potential in automating parts of this process, they often provide limited user control over the handling of multimodal inputs and outputs. In this work, we introduce the notion of "infomorphs" which are modular, user-steerable, AI-augmented transformations that support controlled synthesis, and restructuring of information across formats and modalities. We propose a design space that leverage infomorph-driven workflows to enable flexible, interactive, and multimodal document creation by combining Generative AI techniques with user intent and desired information context. As a concrete instantiation of this design space, we present DocuCraft, a canvas-based interface to visually compose infomorph workflows. DocuCraft allows users to chain together infomorphs that perform operations such as page extraction, content summarization, reformatting, and generation, leveraging Generative AI at each stage to support rich, cross-document and cross-modal transformations. We demonstrate the capabilities of DocuCraft through an example-driven usage scenario that spans across different facets of common knowledge work tasks illustrating its support for fluid, human-in-the-loop document synthesis and highlights opportunities for more transparent and modular interaction for Generative AI-assisted information work.

Doc To The Future: Infomorphs for Interactive, Multimodal Document Transformation and Generation

TL;DR

This work introduces the notion of infomorphs, which are modular, user-steerable, AI-augmented transformations that support controlled synthesis, and restructuring of information across formats and modalities that support flexible, interactive, and multimodal document creation by combining Generative AI techniques with user intent and desired information context.

Abstract

Creating new documents by synthesizing information from existing sources is an important part of knowledge work in many domains. This process often involves gathering content from multiple documents, organizing it, and then transforming it into new forms such as reports, slides, or spreadsheets. While recent advances in Generative AI have shown potential in automating parts of this process, they often provide limited user control over the handling of multimodal inputs and outputs. In this work, we introduce the notion of "infomorphs" which are modular, user-steerable, AI-augmented transformations that support controlled synthesis, and restructuring of information across formats and modalities. We propose a design space that leverage infomorph-driven workflows to enable flexible, interactive, and multimodal document creation by combining Generative AI techniques with user intent and desired information context. As a concrete instantiation of this design space, we present DocuCraft, a canvas-based interface to visually compose infomorph workflows. DocuCraft allows users to chain together infomorphs that perform operations such as page extraction, content summarization, reformatting, and generation, leveraging Generative AI at each stage to support rich, cross-document and cross-modal transformations. We demonstrate the capabilities of DocuCraft through an example-driven usage scenario that spans across different facets of common knowledge work tasks illustrating its support for fluid, human-in-the-loop document synthesis and highlights opportunities for more transparent and modular interaction for Generative AI-assisted information work.
Paper Structure (12 sections, 9 figures, 1 table)

This paper contains 12 sections, 9 figures, 1 table.

Figures (9)

  • Figure 1: Initial chat interaction in DocuCraft. After Alex uploads files and pastes links, the system prompts for activity preferences and returns a list of relevant and irrelevant information sources. This supports source-level triage based on user intent, illustrating information scent (D1), preference-aware prompting (D2.2), and scatter infomorph behavior (D4.1).
  • Figure 2: Initial canvas view auto generated by DocuCraft based on Alex's planning chat conversation \ref{['dg:D2']}$\rightarrow$\ref{['dg:D2']}. The workflow splits into two parallel branches: (top) logistics and budgeting, and (bottom) itinerary planning. Each branch chains together modular infomorphs, such as Relevant Page Extractor, Planners, Viewers and Editors, allowing Alex to inspect, refine, and generate structured outputs. The canvas illustrates a transparent, human-in-the-loop transformation process across multiple modalities (PDFs, URLs, PPTs, DOCs, and XLs).
  • Figure 3: Review and export of the budget estimate co-created by DocuCraft and Alex as formatted XLSX file.
  • Figure 4: Creation of revised itinerary by merging information from the previous itinerary and UIST_Program_Final.pdf
  • Figure 5: The DocuCraft canvas during post-conference presentation synthesis. Daily notes and the final itinerary (from Document Editor and Relevant Page Extractor nodes) feed into the Slide Deck Planner. The resulting draft presentation is displayed and refined in the interactive Slide Deck Plan Viewer (center-right), showcasing AI-generated title imagery and content slides. These refined slides are subsequently exported using a Slide Deck Builder node which applies a designated institutional template.
  • ...and 4 more figures