Table of Contents
Fetching ...

Texture: Structured Exploration of Text Datasets

Will Epperson, Arpit Mathur, Adam Perer, Dominik Moritz

TL;DR

Texture addresses the need for flexible exploratory text analytics beyond fixed representations by introducing a configurable attribute schema that covers multiple granularity levels and a unified interactive interface. It combines attribute overviews, cross filtered visualizations, embedding based projections and similarity search, and contextual linking back to documents. A two part user study with 10 participants across domains demonstrates that Texture can represent diverse attributes, speeds up analysis, and enables new insights, including detection of data quality issues via embeddings. The work contributes a scalable, general purpose design for interactive text exploration and highlights implications for integrating attribute derivation with AI aided analysis in real world workflows.

Abstract

Exploratory analysis of a text corpus is essential for assessing data quality and developing meaningful hypotheses. Text analysis relies on understanding documents through structured attributes spanning various granularities of the documents such as words, phrases, sentences, topics, or clusters. However, current text visualization tools typically adopt a fixed representation tailored to specific tasks or domains, requiring users to switch tools as their analytical goals change. To address this limitation, we present Texture, a general-purpose interactive text exploration tool. Texture introduces a configurable data schema for representing text documents enriched with descriptive attributes. These attributes can appear at arbitrary levels of granularity in the text and possibly have multiple values, including document-level attributes, multi-valued attributes (e.g., topics), fine-grained span-level attributes (e.g., words), and vector embeddings. The system then combines existing interactive methods for text exploration into a single interface that provides attribute overview visualizations, supports cross-filtering attribute charts to explore subsets, uses embeddings for a dataset overview and similar instance search, and contextualizes filters in the actual documents. We evaluated Texture through a two-part user study with 10 participants from varied domains who each analyzed their own dataset in a baseline session and then with Texture. Texture was able to represent all of the previously derived dataset attributes, enabled participants to more quickly iterate during their exploratory analysis, and discover new insights about their data. Our findings contribute to the design of scalable, interactive, and flexible exploration systems that improve users' ability to make sense of text data.

Texture: Structured Exploration of Text Datasets

TL;DR

Texture addresses the need for flexible exploratory text analytics beyond fixed representations by introducing a configurable attribute schema that covers multiple granularity levels and a unified interactive interface. It combines attribute overviews, cross filtered visualizations, embedding based projections and similarity search, and contextual linking back to documents. A two part user study with 10 participants across domains demonstrates that Texture can represent diverse attributes, speeds up analysis, and enables new insights, including detection of data quality issues via embeddings. The work contributes a scalable, general purpose design for interactive text exploration and highlights implications for integrating attribute derivation with AI aided analysis in real world workflows.

Abstract

Exploratory analysis of a text corpus is essential for assessing data quality and developing meaningful hypotheses. Text analysis relies on understanding documents through structured attributes spanning various granularities of the documents such as words, phrases, sentences, topics, or clusters. However, current text visualization tools typically adopt a fixed representation tailored to specific tasks or domains, requiring users to switch tools as their analytical goals change. To address this limitation, we present Texture, a general-purpose interactive text exploration tool. Texture introduces a configurable data schema for representing text documents enriched with descriptive attributes. These attributes can appear at arbitrary levels of granularity in the text and possibly have multiple values, including document-level attributes, multi-valued attributes (e.g., topics), fine-grained span-level attributes (e.g., words), and vector embeddings. The system then combines existing interactive methods for text exploration into a single interface that provides attribute overview visualizations, supports cross-filtering attribute charts to explore subsets, uses embeddings for a dataset overview and similar instance search, and contextualizes filters in the actual documents. We evaluated Texture through a two-part user study with 10 participants from varied domains who each analyzed their own dataset in a baseline session and then with Texture. Texture was able to represent all of the previously derived dataset attributes, enabled participants to more quickly iterate during their exploratory analysis, and discover new insights about their data. Our findings contribute to the design of scalable, interactive, and flexible exploration systems that improve users' ability to make sense of text data.

Paper Structure

This paper contains 37 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Following the Texture data schema requires placing list attributes into new tables that map back to the documents.
  • Figure 2: All attributes are automatically visualized according to their data type (quantitative, categorical, or date) regardless of if they are lists or single-valued. Attribute visualizations support interactive cross-filtering and can color the projection overview.
  • Figure 3: Document embeddings enable a projection overview and similarity search.
  • Figure 4: Texture helps users contextualize attribute filters in the actual documents by showing documents that match current filters and highlighting the spans of text for filtered span list attributes.
  • Figure 5: Participants used Texture to explore a wide variety of datasets including LLM outputs, song lyrics, and Reddit posts.
  • ...and 1 more figures