Spacewalker: Traversing Representation Spaces for Fast Interactive Exploration and Annotation of Unstructured Data
Lukas Heine, Fabian Hörst, Jana Fragemann, Gijs Luijten, Jan Egger, Fin Bahnsen, M. Saquib Sarfraz, Jens Kleesiek, Constantin Seibold
TL;DR
Spacewalker addresses the challenge of analyzing and annotating large-scale unstructured multimodal data by providing an interactive visualization and annotation tool that supports arbitrary embeddings and dimensionality reduction. It enables dynamic exploration in 2D/3D spaces and multimodal queries to reveal semantic relationships, detect data integrity issues, and accelerate labeling. User studies show substantial annotation speedups with some accuracy trade-offs and improved interactivity, especially with 3D views and text queries. The work provides an open-source implementation with a flexible microservice architecture and demonstrates practical benefits across domains dealing with unstructured data.
Abstract
In industries such as healthcare, finance, and manufacturing, analysis of unstructured textual data presents significant challenges for analysis and decision making. Uncovering patterns within large-scale corpora and understanding their semantic impact is critical, but depends on domain experts or resource-intensive manual reviews. In response, we introduce Spacewalker in this system demonstration paper, an interactive tool designed to analyze, explore, and annotate data across multiple modalities. It allows users to extract data representations, visualize them in low-dimensional spaces and traverse large datasets either exploratory or by querying regions of interest. We evaluated Spacewalker through extensive experiments and annotation studies, assessing its efficacy in improving data integrity verification and annotation. We show that Spacewalker reduces time and effort compared to traditional methods. The code of this work is open-source and can be found at: https://github.com/code-lukas/Spacewalker
