Table of Contents
Fetching ...

Moving Pictures of Thought: Extracting Visual Knowledge in Charles S. Peirce's Manuscripts with Vision-Language Models

Carlo Teo Pedretti, Davide Picca, Dario Rodighiero

TL;DR

This paper investigates extracting and structuring visual knowledge from Charles S. Peirce's multimodal manuscripts using Vision-Language Models (VLMs). It proposes a modular workflow that segments page layouts, links diagram regions via IIIF/WADM annotations, and prompts VLMs through prompts informed by Peirce’s semiotics to generate concise diagram captions, which are serialized into RDF for Linked Open Data integration. A combined pipeline of page/layout classifiers and object detectors, plus a semiotic-driven prompting strategy, is evaluated on Peirce's corpus, with GPT-4o showing the strongest interpretative performance among tested models. The work demonstrates that diagrammatic reasoning can be operationalized in historical, heterogeneous documents and provides a foundation for broader semantic editions and knowledge-graph enrichment of multimodal scholarly artifacts.

Abstract

Diagrams are crucial yet underexplored tools in many disciplines, demonstrating the close connection between visual representation and scholarly reasoning. However, their iconic form poses obstacles to visual studies, intermedial analysis, and text-based digital workflows. In particular, Charles S. Peirce consistently advocated the use of diagrams as essential for reasoning and explanation. His manuscripts, often combining textual content with complex visual artifacts, provide a challenging case for studying documents involving heterogeneous materials. In this preliminary study, we investigate whether Visual Language Models (VLMs) can effectively help us identify and interpret such hybrid pages in context. First, we propose a workflow that (i) segments manuscript page layouts, (ii) reconnects each segment to IIIF-compliant annotations, and (iii) submits fragments containing diagrams to a VLM. In addition, by adopting Peirce's semiotic framework, we designed prompts to extract key knowledge about diagrams and produce concise captions. Finally, we integrated these captions into knowledge graphs, enabling structured representations of diagrammatic content within composite sources.

Moving Pictures of Thought: Extracting Visual Knowledge in Charles S. Peirce's Manuscripts with Vision-Language Models

TL;DR

This paper investigates extracting and structuring visual knowledge from Charles S. Peirce's multimodal manuscripts using Vision-Language Models (VLMs). It proposes a modular workflow that segments page layouts, links diagram regions via IIIF/WADM annotations, and prompts VLMs through prompts informed by Peirce’s semiotics to generate concise diagram captions, which are serialized into RDF for Linked Open Data integration. A combined pipeline of page/layout classifiers and object detectors, plus a semiotic-driven prompting strategy, is evaluated on Peirce's corpus, with GPT-4o showing the strongest interpretative performance among tested models. The work demonstrates that diagrammatic reasoning can be operationalized in historical, heterogeneous documents and provides a foundation for broader semantic editions and knowledge-graph enrichment of multimodal scholarly artifacts.

Abstract

Diagrams are crucial yet underexplored tools in many disciplines, demonstrating the close connection between visual representation and scholarly reasoning. However, their iconic form poses obstacles to visual studies, intermedial analysis, and text-based digital workflows. In particular, Charles S. Peirce consistently advocated the use of diagrams as essential for reasoning and explanation. His manuscripts, often combining textual content with complex visual artifacts, provide a challenging case for studying documents involving heterogeneous materials. In this preliminary study, we investigate whether Visual Language Models (VLMs) can effectively help us identify and interpret such hybrid pages in context. First, we propose a workflow that (i) segments manuscript page layouts, (ii) reconnects each segment to IIIF-compliant annotations, and (iii) submits fragments containing diagrams to a VLM. In addition, by adopting Peirce's semiotic framework, we designed prompts to extract key knowledge about diagrams and produce concise captions. Finally, we integrated these captions into knowledge graphs, enabling structured representations of diagrammatic content within composite sources.

Paper Structure

This paper contains 13 sections, 2 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Paul Klee's Theory of Pictorial Configuration as a diagram. Zentrum Paul Klee, Bern, Inv.Nr. BG A/030. Photo: Zentrum Paul Klee.
  • Figure 2: An example of diagram in religious art taken from hamburger2019devotion: Alton Towers Triptych, Cologne (?), ca. 1150. London, Victoria & Albert Museum, inv. no. 4757-1858. Photo: © Victoria & Albert Museum.
  • Figure 3: Distribution of digitized manuscript pages across Peirce’s lifetime, grouped by five-year intervals and categorized according to Robin’s classification. The visualization, based on IIIF canvas data, highlights how Peirce’s intellectual focus evolved over time, with Logic manuscripts dominating the corpus and peaks corresponding to his most productive years in formal reasoning.
  • Figure 4: Pie charts show text and diagrams distribution within three categories.
  • Figure 5: Output of the fine-tuned YOLOv8m model on autograph manuscript dated 1902. MS Am 1632 (430), Box 29, Folder 28, Series I. Manuscripts, D. Logic. Houghton Library, Harvard University, USA. Photo: © Houghton Library. Persistent URL: http://nrs.harvard.edu/urn-3:FHCL.HOUGH:12491033. The model correctly identifies and segments 'diagram' (blue) and 'text_block' (light blue) regions, providing the structured data used for subsequent annotation and VLM analysis.
  • ...and 1 more figures