Texture: Structured Exploration of Text Datasets

Will Epperson; Arpit Mathur; Adam Perer; Dominik Moritz

Texture: Structured Exploration of Text Datasets

Will Epperson, Arpit Mathur, Adam Perer, Dominik Moritz

TL;DR

Texture addresses the need for flexible exploratory text analytics beyond fixed representations by introducing a configurable attribute schema that covers multiple granularity levels and a unified interactive interface. It combines attribute overviews, cross filtered visualizations, embedding based projections and similarity search, and contextual linking back to documents. A two part user study with 10 participants across domains demonstrates that Texture can represent diverse attributes, speeds up analysis, and enables new insights, including detection of data quality issues via embeddings. The work contributes a scalable, general purpose design for interactive text exploration and highlights implications for integrating attribute derivation with AI aided analysis in real world workflows.

Abstract

Exploratory analysis of a text corpus is essential for assessing data quality and developing meaningful hypotheses. Text analysis relies on understanding documents through structured attributes spanning various granularities of the documents such as words, phrases, sentences, topics, or clusters. However, current text visualization tools typically adopt a fixed representation tailored to specific tasks or domains, requiring users to switch tools as their analytical goals change. To address this limitation, we present Texture, a general-purpose interactive text exploration tool. Texture introduces a configurable data schema for representing text documents enriched with descriptive attributes. These attributes can appear at arbitrary levels of granularity in the text and possibly have multiple values, including document-level attributes, multi-valued attributes (e.g., topics), fine-grained span-level attributes (e.g., words), and vector embeddings. The system then combines existing interactive methods for text exploration into a single interface that provides attribute overview visualizations, supports cross-filtering attribute charts to explore subsets, uses embeddings for a dataset overview and similar instance search, and contextualizes filters in the actual documents. We evaluated Texture through a two-part user study with 10 participants from varied domains who each analyzed their own dataset in a baseline session and then with Texture. Texture was able to represent all of the previously derived dataset attributes, enabled participants to more quickly iterate during their exploratory analysis, and discover new insights about their data. Our findings contribute to the design of scalable, interactive, and flexible exploration systems that improve users' ability to make sense of text data.

Texture: Structured Exploration of Text Datasets

TL;DR

Abstract

Texture: Structured Exploration of Text Datasets

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)