Table of Contents
Fetching ...

Perspectives - Interactive Document Clustering in the Discourse Analysis Tool Suite

Tim Fischer, Chris Biemann

TL;DR

This paper introduces Perspectives, an interactive extension of the Discourse Analysis Tool Suite designed to empower Digital Humanities scholars to explore and organize large, unstructured document collections, which implements a flexible, aspect-focused document clustering pipeline with human-in-the-loop refinement capabilities.

Abstract

This paper introduces Perspectives, an interactive extension of the Discourse Analysis Tool Suite designed to empower Digital Humanities (DH) scholars to explore and organize large, unstructured document collections. Perspectives implements a flexible, aspect-focused document clustering pipeline with human-in-the-loop refinement capabilities. We showcase how this process can be initially steered by defining analytical lenses through document rewriting prompts and instruction-based embeddings, and further aligned with user intent through tools for refining clusters and mechanisms for fine-tuning the embedding model. The demonstration highlights a typical workflow, illustrating how DH researchers can leverage Perspectives's interactive document map to uncover topics, sentiments, or other relevant categories, thereby gaining insights and preparing their data for subsequent in-depth analysis.

Perspectives - Interactive Document Clustering in the Discourse Analysis Tool Suite

TL;DR

This paper introduces Perspectives, an interactive extension of the Discourse Analysis Tool Suite designed to empower Digital Humanities scholars to explore and organize large, unstructured document collections, which implements a flexible, aspect-focused document clustering pipeline with human-in-the-loop refinement capabilities.

Abstract

This paper introduces Perspectives, an interactive extension of the Discourse Analysis Tool Suite designed to empower Digital Humanities (DH) scholars to explore and organize large, unstructured document collections. Perspectives implements a flexible, aspect-focused document clustering pipeline with human-in-the-loop refinement capabilities. We showcase how this process can be initially steered by defining analytical lenses through document rewriting prompts and instruction-based embeddings, and further aligned with user intent through tools for refining clusters and mechanisms for fine-tuning the embedding model. The demonstration highlights a typical workflow, illustrating how DH researchers can leverage Perspectives's interactive document map to uncover topics, sentiments, or other relevant categories, thereby gaining insights and preparing their data for subsequent in-depth analysis.
Paper Structure (39 sections, 3 figures, 3 tables)

This paper contains 39 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Perspectives' document map. Center: interactive scatter plot of documents colored by cluster. Hovering over a document previews its content. Left: settings & refinement operations, including the model fine-tuning operation. Right: statistics & information about selected documents. Top: toolbar with search & filtering. The UI is inspired by popular clustering interfaces such as Nomic Atlas, leveraging a familiar design to accelerate adoption.
  • Figure 2: The proposed interactive clustering pipeline. The initial clustering process is guided by providing rewriting and embedding instructions (green) to focus the document representations on user-defined aspects. The established core pipeline (orange) identifies clusters and builds various textual representations. Users can post-process the clustering (blue) through refinement operations (grey), triggering some steps of the pipeline.
  • Figure 3: Detailed evaluation with increasing number of labeled examples per class.