Table of Contents
Fetching ...

Spacewalker: Traversing Representation Spaces for Fast Interactive Exploration and Annotation of Unstructured Data

Lukas Heine, Fabian Hörst, Jana Fragemann, Gijs Luijten, Jan Egger, Fin Bahnsen, M. Saquib Sarfraz, Jens Kleesiek, Constantin Seibold

TL;DR

Spacewalker addresses the challenge of analyzing and annotating large-scale unstructured multimodal data by providing an interactive visualization and annotation tool that supports arbitrary embeddings and dimensionality reduction. It enables dynamic exploration in 2D/3D spaces and multimodal queries to reveal semantic relationships, detect data integrity issues, and accelerate labeling. User studies show substantial annotation speedups with some accuracy trade-offs and improved interactivity, especially with 3D views and text queries. The work provides an open-source implementation with a flexible microservice architecture and demonstrates practical benefits across domains dealing with unstructured data.

Abstract

In industries such as healthcare, finance, and manufacturing, analysis of unstructured textual data presents significant challenges for analysis and decision making. Uncovering patterns within large-scale corpora and understanding their semantic impact is critical, but depends on domain experts or resource-intensive manual reviews. In response, we introduce Spacewalker in this system demonstration paper, an interactive tool designed to analyze, explore, and annotate data across multiple modalities. It allows users to extract data representations, visualize them in low-dimensional spaces and traverse large datasets either exploratory or by querying regions of interest. We evaluated Spacewalker through extensive experiments and annotation studies, assessing its efficacy in improving data integrity verification and annotation. We show that Spacewalker reduces time and effort compared to traditional methods. The code of this work is open-source and can be found at: https://github.com/code-lukas/Spacewalker

Spacewalker: Traversing Representation Spaces for Fast Interactive Exploration and Annotation of Unstructured Data

TL;DR

Spacewalker addresses the challenge of analyzing and annotating large-scale unstructured multimodal data by providing an interactive visualization and annotation tool that supports arbitrary embeddings and dimensionality reduction. It enables dynamic exploration in 2D/3D spaces and multimodal queries to reveal semantic relationships, detect data integrity issues, and accelerate labeling. User studies show substantial annotation speedups with some accuracy trade-offs and improved interactivity, especially with 3D views and text queries. The work provides an open-source implementation with a flexible microservice architecture and demonstrates practical benefits across domains dealing with unstructured data.

Abstract

In industries such as healthcare, finance, and manufacturing, analysis of unstructured textual data presents significant challenges for analysis and decision making. Uncovering patterns within large-scale corpora and understanding their semantic impact is critical, but depends on domain experts or resource-intensive manual reviews. In response, we introduce Spacewalker in this system demonstration paper, an interactive tool designed to analyze, explore, and annotate data across multiple modalities. It allows users to extract data representations, visualize them in low-dimensional spaces and traverse large datasets either exploratory or by querying regions of interest. We evaluated Spacewalker through extensive experiments and annotation studies, assessing its efficacy in improving data integrity verification and annotation. We show that Spacewalker reduces time and effort compared to traditional methods. The code of this work is open-source and can be found at: https://github.com/code-lukas/Spacewalker
Paper Structure (10 sections, 7 figures)

This paper contains 10 sections, 7 figures.

Figures (7)

  • Figure 1: Spacewalker is a tool designed for pattern discovery within extensive multimodal datasets, employing arbitrary neural networks and visualization techniques. Once the data is visualized, users can dynamically examine the dataset through simple mouse interactions and multimodal queries to pinpoint samples.
  • Figure 2: Main UI components of Spacewalker: Lower-dimensional representations (1), visualization and annotation parameters (2), query dialog (3) and data preview (4)
  • Figure 3: Labeling performances in LabelStudio (blue) and Spacewalker (orange) for text and image
  • Figure 4: Participants' performances in the data integrity assessment user study for different combinations of encoders and DRMs
  • Figure 5: Participants' rankings of encoder and DRM combinations
  • ...and 2 more figures